English Phonetics

English Vocaloids are Vocaloids that are capable of mimicking the English language much easier than Vocaloids of other languages. The following is a list of phonemes needed to make the Vocaloid sing in English.

About
The English language has one of the greatest variations of dialect in the world. Thus, there is much more variety of pronunciation for English Vocaloids than Vocaloid such as those that sing in Japanese.

The English language itself is made up of about 20 vowel sounds and 24 consonant sounds. Also, English does not have a systematic orthography, so there is not  a one-to-one or near one-to-one match between letters and sounds as it happens in Japanese, for example. "W" can sound /w/ in "what" and /u:/ in "few", "Y" can sound /j/ in "yes" and /i/ in "play". there are also differences between spellings of words, such as those seen in British and American spellings of words such as "Colour/color". Vocaloid and Vocaloid 2, uses American spelling. Vocaloid 3 is confirmed to be capable of localisation, but it is unknown if it will open up the ability to have American and British spelling.

English Vocaloids
Despite the general belief that singers loose their accents when they sing, this is not the case and a accent is possible to be heard even in singing vocals. However, the reason many are led to believe this is that there are several methods of training singers to disguise or otherwise hide their natural accents. Though the English language is not alone in the problems of accent as other languages may suffer from this same problem, English Vocaloids have proven to be difficult to avoid issues with accents. Even the first two Vocaloids in English, Leon and Lola, were noted their distinctly "British" accent. The result is that the accent has been known to aid or add difficulty to the use of synthesizing software and Vocaloid is no stranger to this effect. The result is that so far, English vocaloids have ended up with the most variation on how they sound out of all the Vocaloids so far produced for the software.

The impact of the dialect/accent on English Vocaloids can result in a notorious variation of certain sounds, being notorious in the case of the diphthongs and rhotic vowels. Users who are not aware of the potential difficulty of accents may overlook odd pronunciations that need to be adjusted for better results. This is true for non-native based accents voicebanks more so, due the voice provider may have pronunciation issues with a non-native language.

British-English Accented
British-English accented Vocaloids were Vocaloids whose provider was known to have been of "British" nationality. As Great Britain is the main origin of English, British-English Vocaloids sing in a native English accent. Originally, they were the standard English accent type used to develop the English engine. British accented Vocaloids mostly came originally from Zero-G who worked solely with British artists to collect their vocal samples from.

''Note: The term 'British' applies to anyone from England, Scotland, Wales and Northern Ireland and therefore the variation of the accent can differ greatly overall. The British Isles have the greatest variation of accents for English in the world per sq. mile of land. '' (For more information see Wikipedia.)


 * Leon
 * Lola
 * Oliver
 * Avanna

American-English Accented
American accented Vocaloids have providers that came from the United States of America, and fir this they are native speakers of the English language. Due to the fact there is only one American-English accented Vocaloid, practically there isn't any other voicebank to compare against it. The most notorious difference with the British accented voicebanks is in the rhotic phonemes like [r] and the r-colored diphones. This is because the British dialects usually are non-rhotic; in North America rhotic dialects of the English are predominant. (For more information see Wikipedia.)


 * Big Al

Australian Accented
Australian Accents are the normal english accent for individuals from Australia. This particular accent is normally very distinct compared to all other English accents, with features unique from all other Engish dialects. (For more information see Wikipedia.)


 * Sweet Ann - her provider "Jody" supposedly came from Australia.

South-African Accented
South African accents are accents belonging to individuals from Soth Africa. English was not a native language to Africa and was introduced during the colonisation of African countries by English colonist, resulting in the English language becoming widely used in South Africa itself as the general Lingua franca between regions. Variation in impact of native languages on the English language results in a large variation of strength and tone of the accent, though in general most south african accents resemble closely to South England accents in nature. (For more information see Wikipedia.)


 * Miriam

Japanese-English Accented
Japanese-English accented Vocaloids are English Vocaloids produced by those who came from Japan. They have the Japanese language as their native language, but were used to produce English voicebanks. Therefore the Japanese-English accent is a non-native English accent and has significant differences between it and native English accents. While it would be true that Luka is the only Japanese-English accented Vocaloid with an English voicebank, do note that even from the demos of Hatsune Miku and Kaito's English voicebanks, they have many common traits that are clearly able to be picked out that are already known about Luka.

The major issue seen with Japanese accents is that they often struggle with a number of sounds due to the lack of distinction between them. In the case of the vowel sounds these ones usually are either too tense or too lax, as the speaker tends to approximate the vowel sound to their 5-vowel system. Many of these are simply sounds that in the Japanese language itself would have no distinction ("L" and "R" sounds for example). Luka's use of English to pronounce the words "Road Roller", which risks coming out as sounding like "roe rorora", is the most famous case. Depending on the providers efficiency in English, depends on the level of difficulty the Vocaloid will have in making pronunciations distinct. For instance, Miku's provider Saki Fujita had troubles pronounication English triphones. Despite this, Japanese-English Vocaloids are capable of more closely mimicking the English language than a purely Japanese voicebank.


 * Megurine Luka (Yū Asakawa was competent in speaking English)
 * Hatsune Miku (yet released; Saki Fujita did not speak English at all prior to the voicebanks recording)
 * Kaito (yet released; Naoto Fūga has an unknown level of English)
 * Meiko (yet released; Meiko Haigō has an unknown level of English)
 * Megpoid (yet released; Megumi Nakajima has a good level of English)
 * Kagamine Rin/Len (yet released; Asami Shimoda has been taking English lessons)

Korean-English accented
Korean-English accented Vocaloids are Vocaloids produced by those who come from Korea. As there is only one unreleased Vocaloid voicebank with this accent, details cannot be released.

SeeU's Korean voicebank was given English phonemes to mimic "English". However, again this does not produce quality results enough to comment on.


 * SeeU - A English Voicebank is in production.

Misc.

 * Prima - Accent unconfirmed
 * Sonika - Accent unconfirmed
 * Tonio - Accent unconfirmed

Phonetic System's Characteristics
There are 52 phonetic pronunciations which make up the English Vocaloid library, these phonetic inputs will use any set of the estimated 2500 diphonetic samples, (Vocaloid uses a total of approx 8,500 samples altogether for english) needed for English recreation altogether.

Vowels
The English phonetic system includes 10 vowels of the 11 pure vowels of the English Language, missing the phoneme /ɑː/ or open unrounded vowel.

Diphones
The English phonetic also includes an array of diphones including 6 r-colored or rhotacized vowels, 3 [j]-colored vowels and 2 [w]-colored vowels. Although useful, the diphones tends to cause some problems for the user when they need to be extended or split in two notes.

Also is important consider the diphones can have important pronunciation differences depending the dialect of the voice provider for the chosen voicebank. An example is seen with the r-coloreds vowels, these ones depending if the voice provider comes from a country that has rhotic accent or not, can be realized as a diphthong that end in schwa (non-rhotic), as a rhoticed vowel or as a vowel/diphthong that ends with a notorious R (rhotic).

Consonants
The Phonetic System also includes 31 consonants phonemes. The system considers the various allophones of the English language, including aspirated consonants, both allophones for L (dark and clear). After the release of Prima, it was included the /r/ or Rolling R to the phonetic system.

Aspirated Consonants
English makes distinctions with the normal and aspirated consonant. The aspiration is the strong burst of air that accompanies at the release of of some obstruents. In International Phonetic Alphabet the aspirated phonemes are indicated by a small superscript ‹h›, as with [kʰ] for a aspirated [k].

In the English language, the consonants [b], [d], [g], [p], [t], [k] became aspirated at the beginning of the words. The aspirated phonemes are distinguished from their standard versions due to the addition of a h which represents the IPA's small superscript ‹ʰ›.

Dark L and Clear L
The system also includes both allophones for the L, the [l0] or alveolar lateral approximant, also known as Clear L; and the [l] phoneme or velarized alveolar lateral approximant, also known as Dark L.

These phonemes aren't designed to be encoded alone; however, the [l0] seems to handle better to be reproduced without a vowel in comparison to the [l] phoneme. The former results in audio loop, while the latter generates electronic buzzing or doesn't produce sound at all without a vowel.

The only exception to this is Luka, which her [l] phoneme can be used alone and extended without suffer distortion.

Rolling R
Although it is not a phoneme of the English language, the alveolar trill or rolling R was included to the English phonetic system to increase the Opera singing capabilities of Prima. After this, it became a common phoneme in the English voicebanks released after Prima (with exception of Luka).

Nonetheless, the performance of this phoneme may vary between different English Vocaloids. For example, it is known that Big Al is capable of using it only at the end of words and requires some techniques and further edition to use it in the beginning or middle of a word.

The symbol which represents it in the English Phonetic System is the phoneme [R] in Vocaloids.

Phoneme Replacement
Due to the big array of allophones and similar sounding phonemes (particularly in the case of the vowels) available in the English Language, this allows a great flexibility replacing the phonemes for similar ones. This has a lot of applications, like altering the emphasis or stress of a word, correcting a strange pronunciation found in a voicebank, alter the accent or general pronunciation of a particular Vocaloid , etc.

Also is possible replace some diphones with some phoneme combination. For example, it's possible replace a [j]-colored diphone like [aI] (Open front unrounded vowel + Near-close near-front unrounded semivowel) with an ah-like sound followed by the glide [j]. Similarly is possible change a r-colored vowel with a vowel + [r] combination, something useful if you want change between a non-rhotic accent to a rhotic accent.

This added to some auxiliar phonemes allows a great diversity of combinations and possibilities to experiment. However, the user must consider the results may vary between the different voicebanks due the individual differences like accent, pronunciation and samples' quality present in the voicebank. The most recommended is take these tips as a guide and experiment by yourself.

Consonant Replacement
As was mentioned, the English phonetic system contains the allophones of some consonants (the plosives and the L). If we add the voiced counterparts of these consonants, this gives us around 3 possible for replace the consonant.

Plosives
As said before, the user can replace the plosives for aspirated allophones without major issues due to sounding practically identical, just varying in the stress and air release. If a consonant sounds too strident or too weak, it's possible to replace it with the corresponding allophone.

In the case of the voiced counterparts, these ones usually can be used as allophones at the end of the syllable, where the voicing contrast is minor and the consonant are prone to voicing assimilation phenomena. This isn't limited exclusively to the plosives, but to the sibilants as well, specially the voiceless [s] and its voiced counterpart [z].

Liquid consonant
The L has two allophones: the Clear L, used at the beginning of and

The Dark L is prone to be replaced to a vowel in a process called L-vocalization. Due to its (labio)velar quality, this one shifts to a close back vowel as u:, U or o. Knowing this is possible replace the phoneme [l] for a close back vowel as [u:] or [U] if the user seeks to imitate this process.

Alternatively the user can add a short close back vowel for improve the sound of the consonant and even stretch it as generally the consonant stands better to between two vowels (remember that usually the Dark L doesn't stand to be alone in a note, with exception of Luka's). This last tip can be aided further more with the use of the Vocaloid3's devoiced vowels.

Vowel Replacement
The English phonetic system is the one with the biggest number of vowels (pure, diphthongs and rhotic vowels), with many with a major or minor of resemblance (just with the pure vowels, the user has over 10 vowels to work). Knowing this is possible replace a vowel for another one with a similar sound.

The idea is now which vowel is close enough to be replaced with. For example the phonemes [{], [Q] and [V] are the enough similar in terms of sound to be replaced mutually. It's a good idea revise the IPA vowel chart and see which vowel is the proximity of the vowels

However although can be vowels similar, there are slight differences that can produce differences of stress in the pronunciation. For example, although [U] and [u:] can be similar, but the first one has a looser pronunciation while the second one is closer and more rounded. These differences also must be considered when the user is attempting to split a diphone. Anyway, it's possible work with the Opening (OPE) for alter the pronunciation of the vowels and making them closer or more open if it's required.

As special cases, when some some diphones are realized as extended vowels they can be used as well for the replacement. Example: [eI] phoneme realized as a [e:], and not as [eɪ] or [ej]; or when the [Q@] is realized as a [ɑ:], and not as [ɑɹ] or [ɑə]. However the user must be aware this varies a lot between the voicebanks, and depends strongly of the dialect talked by the voice provider.

Phonetics List
Special note: This was the list is based in the Big Al's help file, complimented with the chart of Vocaloid-User.Net and expanded to include the IPA's symbols and names. However there were some incorrect entries within the released list. Entering some of the words provided here as examples for the phoneme usage will not result in the expected phonemes that were used for the list. In addition, the list did not indicate which particular letters the phoneme applied to; this section has underlined the relevant letters for the benefit of readers. Of the Japanese Vocaloids, only Luka will be able to use this system properly.