English - Japanese

Language Differences
Due to the set up of the Japanese Vocaloids, they are more limiting for the use of the English language, since the phonology of the Japanese language including phonemes, accents, tones, intonations, moras and assimilation's, is very different from that of the English language. As each consonant sound is always followed by inseparable vowels and consonants do not get in cluster in the Japanese language, generally each of them is pronounced weakly and not independently, except んn, sokuon and some transliterated phonemes for non-Japanese words. Because of this, some of Japanese Vocaloids’ consonant sounds slightly contain vowel sounds to be smooth and sound right in Japanese when they are connected to the following vowels.

Also is important known, the symbols suggested by the X-SAMPA couldn't match their actual pronunciations leading to some errors; for instance, the Vocaloid symbol [S] correspond to the /ʃ/ in English Vocaloids and /ɕ/ in Japanese ones, basically Japanese "a" is a low central vowel and is between the English "a" in "father" and the English "a" in "dad" , and "r" in Japanese is not as same as either "r" or "l" in English.

In addition, the English language often puts emphasis on certain letters of words (stress accent) while the Japanese language frequently use pitch accents. These differences between two languages frequently make Japanese Vocaloids retain a Japanese accent when there is no perfectly equivalent phonemes, even if users manage them to sing in the correct language. On the contrary, the same things can happen to English Vocaloids and they often have English accents when they sing in other languages.

Another consideration with English Vocaloids is their regional accent. This will not affect any of the Vocaloids' overall performance or the handling of the Vocaloid engine and they will use identical Phonemes regardless. In fact, the only effect this will have on the Vocaloid is simply a particular stress or emphasis on certain vowels and consonants that may not be seen in another English Vocaloid, but may make an English Vocaloid sound not how a user expects. Examples of Vocaloids who may be affected by this include Sonika who has a British accent and Big Al who has an American; also included in this is Luka Megurine who will retain a Japanese accent. One noted example of a regional accent affecting a Vocaloid's outcome is Big Al's pronunciation of vowel sounds; he can often be harder to make sing in Japanese because of it. In contrast, Japanese Vocaloids do not have as much of a regional accent effect between them in Japanese.

As of Vocaloid 3 Japanese Vocaloids can more closer mimick English language sounds thanks to the addition of new sounds they did not have in Vocaloid 2. However, more complex words and sounds are still beyond the Japanese Vocaloids reach and this limits the capablities of a Japanese Vocaloid mimicking English sounds.

Working the Vowels
The English hasn't the same vowels as the Japanese. In the most of the cases the English vowels is intermadiate Japanese ones.
 * Example: The Japanese O (お) it's between the O in c o re (more open) and the O in g o  (closer and often diphthongized).

Beside this fact, the pronunciation of the English tends to be more loose. For these reasons is possible to see a strong English accent in the vowels if the user doesn't work them with care.

Fortunely the English has a big array of vowels which allows multiple possibilities to replace a vowel. For a vowel the user must consider the similitute between both, the Japanese and the intended. Also the user must known the dialect could affect the realization of the English vowel.


 * Example: Oliver's [{] has been reported to be more centralized, sounding more similar to an /a/ than a /æ/ in comparison to other English voicebanks.

The dialect affects specially to the diphones (dipthongs and rhotic diphones). For example, if the accent is rhotic or non-rothic, the rhotic diphones can be realized either as long pure vowels or dipthongs ending in [@]; while if the accent is rhotic, these ones can be realized as vowel~[r] combination or as rhotic vowel.

If the diphone is realized as a long vowel, those ones can be used as a possible semiequivalent for the Japanese vowel intended to imitate.


 * Example: Big Al's [eI] phoneme is realized more as a long /e:/ than a /eɪ/ diphtong.

Knowing this the user must test which vowels sounds better. For make easier the work, the best is group the vowels accord how much similar are to the Japanese counterpart.

After choose the closest or most fitting vowel, the pronunciation can be approximated further more adjusting the Velocity (VEL).

Another possibility is combining the vowels properly.
 * Example:

Palatalization
The palatalization is a phonological process where the articulation of a consonant is modified, causing the middle of the tongue is raised to the palatal position. Due this modified consonant can turn into a palatalized consonant which has a brief palatal glide or "ee"-like sound, or can shift completely to the closest palatal consonant.

The Japanese has some clear lexical and grammatical rules for denote when occurs the palatalization, being and important phonological process in their language.

In contrast, in the case of the English, this is be a allophonic process which generally is unnoticed by the English speakers. This one generally occurring due the influence of the glide [j].


 * Examples:


 * hue /ˈ hj u/ → /ˈ ç uː/ the /h/ voiceless glottal fricative becomes a /ç/ voiceless palatal fricative due influence of the /j/
 * canyon /ˈkæ nj ən/ → /ˈkæ nʲj ən/ the /n/ nasal alveolar becomes a palatalized nasal alveolar (similar to a /ɲ/ palatal nasal) due influence of the /j/

Knowing this is possible take advantage of the allophonic palatalization in the English when you attempt to make an English Vocaloid sing in Japanese, for that is necessary create a brief [j] or "ee"-like sound after the consonant, for that the user can either:


 * Intercalating the glide [j] between the consonant to palatalize and its vocal : The addition of the palatal approximant [j] will influence the consonant palatalizing. Maybe will be required adjust the Velocity (VEL).
 * Example:
 * ぎょうざ (gyōza 'fried dumpling') IPA: /ɡʲoːza/ → Japanese: [g' o z a] English: [g j Q z a]


 * Do a short note with the consonant to palatalize along the vowel [i:] : If the note is the short enough, the articulation of the [i:] will be the incomplete or barely listen, given the a j-colored sound to the consonant. Probably the user will need adjust the Velocity (VEL), also is important take in consideration the Tempo. Additionally, if the user is using Vocaloid3, it can be used the devoiced version of the vowel [i:_0].
 * Example:
 * ぎょうざ (gyōza 'fried dumpling') IPA: /ɡʲoːza/ → Japanese: [g' o z a] English: [g i:][Q z a]

In the case of the post-alveolar sibilants, this trick may be required, as the sound already is palatalized (just in different degree). Doing it can make the sound closer to the native one, however not doing it can be used to change the stress and emphasis if the user requires it.

Liquid Consonant
The liquid consonants are those ones that groups the lateral and rhotic consonants. Generally the languages tends to have 2 liquid consonant, one lateral (generally associated to the L) and one rhotic consonant (generally associated to the R).

In the Japanese there isn't a clear distinction between the both, so for the Japanese R is realized as an undefinied post-alveolar liquid consonant which its sounds tends to vary, being perceived by the Native Japanese Speakers as one phoneme. The sound usually is between /ɾ/ (more R-like and similar to the unstressed American D/T ) and /ɺ/ (flapped L, more lateral or L-like), tending to one or another depending the vowel which follows it. Its for this reason the English users tends to perceive it between their L, R and D sounds.

When the user attempts a more L-like so undthe user simply can use the phoneme [l0] or Light L (usually the [l] or Dark L sounds awkard or excesive Anglo-saxon accent due the velarization). Now, if the user wants a more [ɾ]-like sound, this one can combine the the phoneme [r] with a D-sounding phoneme as [d], [dh] or [D], a method often used by some users when they attempt imitate the American accent. As they have different degrees of stress and prominence, probably the user will need which one gives the best result.


 * Examples:

No matter if the user it's attempting a more R-like or L-like sound, it's important adjust the Velocity (VEL) due the Japanese R (as other consonants), tends to be shorter or more "percussive".

Conversion Chart
Special note: this is based on Big Al's help file and some information is added to show English equivalent/quasi-equivalent phonemes for Japanese phonemes with symbols and compare their actual pronunciations. Even if the Vocaloid symbol transcriptions are the same, their actual pronunciations in each of the language are often different as each IPA shows. This guide is meant for users who is working to make an English to Japanese Vocaloid to sing in the opposite language. However, additional work will be needed to get closer to the target language's phoneme usage.

Additional notes

 * Linguistically, the phonemes which the English language and the Japanese language share in common are k, g, s, z, Z, tS, h, b, p, j and m. Also both English and Japanese voicebanks have e, S, dZ, d, N, n and w, however, these phonemes generally do not sound the same. (See IPA in each language)
 * Since all the voicebanks have their distinctive characteristics, their phonemes do not always produce the same result especially in languages which they are not intended for.
 * The above is particularly true for Miku and Rin, who are remarked to sound excessively aged when singing in normal configurations, higher octaves, but in another language.
 * The most of the consonants in the Japanese phonemes (with exception of the Nasal Consonants) and certain English phonemes are not intended to be encoded standalone. Using them for such may sometimes result in audio distortion, clicks or sound loops

Working the Rhotic Vowels
Depending the dialect, the rhotic vowels can be pronounced either as a vowel+[r] combination or a rhotized vowel, in the case of the rhotic accents, or can be ponounced as a long monothong or a vowel+/ə/ diphthong, in the case of the non-rhotic accents.

As the phoneme [4] and its palatalized counterpart [4']  may have a more L-like sound than a R-like one, its preferable use a non-rhotic approximation when working them.

In the case of the long monothongs, the situation is easy.

For the vowel+schwa diphthongs, the easier is work them as a semi-equivalent vowel + mid vowel combination. The mid vowels, [e] and [o], works well as they closeness to the schwa /ə/. Which one works best is variable, however in general terms, the [e] works well for a unrounded sound while [o] works well for a more rounded sound.

Other useful phonemes
Some phonemes can be used can be used to. As the Japanese voicebanks usually struggles with the consonant clusters, the phoneme [ts] can be used as replacement to the /ts/ cluster, usually formed by the combination of a word ending in /t/ and a abbreviated "is".

Alternatively the palatalized consonant can used to make easier some diphthong (particularly the ones which starts with [j]) or for

Castle in a Cloud
A cover song from the musical Les Misérables, based in the French book of the same name written by Victor Hugo.

Trivia

 * The word "Engrish" is commonly used to describe odd Asian -> English words. The word itself originates from Japanese users habits of using a "r" instead of a "l" when spelling English words. In the Overseas Vocaloid fandom, the word is also often used to describe a Japanese Vocaloid singing in English. This is not as an act of disrespect, but rather just a note that Japanese phonetics were used to make "English".
 * Wat commented on how frustrated he felt when developing the Kaito English voicebank and commented how even a native speaker without patience might shoot their computer as a response to it. The reason he gave was the huge gap between Japanese and English and how the two operate.