English Phonetics

English VOCALOIDs are VOCALOIDs that are capable of mimicking the English language much easier than VOCALOIDs of other languages. The following is a list of phonemes needed to make an English VOCALOID sing in English.

About
The English language has one of the greatest variations of dialect in the world. Thus, there is much more variety of pronunciation for English VOCALOIDs than VOCALOID such as those that sing in other languages.

The English language itself is made up of about 20 vowel sounds and 24 consonant sounds. Also, English doesn't have a systematic orthography, so there is not a one-to-one or near one-to-one match between letters and sounds as it happens with other languages like the Spanish or Japanese.


 * Example. "W" can sound /w/ in "what" and /u:/ in "few", "Y" can sound /j/ in "yes" and /i/ in "play". There are also differences between spellings of words, such as those seen in British and American spellings of words such as "colour/color".

VOCALOID and VOCALOID2, uses American spelling for the lyrics. VOCALOID3 is confirmed to be capable of localisation, but it is unknown if it will open up the ability to have American and British spelling.

However the phonetic notation doesn't follow this, and instead uses the Received Pronunciation written in X-SAMPA, with some minor modifications when it's required, like its the case of the allophones.

During the VOCALOID2 era it was also confirmed that, in contrast to Japanese voicebanks, English voicebanks needed their samples cut at a length of more than 0.5 seconds on many sounds. This was a longer sample length than their Japanese cousins were being cut at for the software. If not done so, their vocals had a habit of cutting out when used for short notes.

English Scripts
The reclist scripts used for English VOCALOIDs also has been confirmed to have an impact on the way an English VOCALOID sounds (shortened to "enlist"). The scripts are the list of sounds a studio has to record in order to obtain all the sounds essential for successful English creation. A bad script results in more errors being present in the VOCALOIDs recordings.

For those not familiar with the English reclist script at the vocals time of recording, regardless of the version used for recording, it can be a challenge. This was noted when Saki Fujita (a veteran of Japanese Vocaloid script reading) faced it for the first time as she thought she would have to relearn script reading from scratch. The reason for this is because the script used is very different in comparison to ones used for other languages such as the script for Japanese.

According to the developer's notes, in regards to CYBER DIVA, the VOCALOID engine itself uses a combination of both British and American phonetic sounds. The result is that sometimes certain sounds may sound off because that particular combination would not typically be used together by either a British or American accented speaker.

Original Yamaha Script
The VOCALOID English script used prior to VOCALOID4 was confirmed to have contained errors, and thus VOCALOIDs, such as YOHIOloid, have incorrect pronunciations. This is important to note that it is a common occurrence for pre-Vocaloid4 that when the combination of symbols are entered into the editor, unexpected sounds may occur.

Due to their differences, the majority of the pre-VOCALOID4 Vocaloids will not produce the same results as post-VOCALOID4 because of the issues with this script alone. Many also lack the schwa sound despite the phonetic symbol still being registered by the engine and there was also an "Aspiration Problem".

Cyber Diva Script
Upon the development of the CYBER DIVA vocal a number of issues were noted that had existed and were finally addressed resulting in the base Yamaha script being improved.

A subtle differences between the old Yamaha English Dev Kit script and the Cyber Diva script is that the newer script produces less expressive tones then the older script, as it focuses on obtaining more clarity per sound.

The Cyber Diva script also fixes the "Aspiration Problem" and includes the Schwa sound recording.

Ruby Script
Ruby also uses a new script that was created by Syo. Once again, the creation of the new script was owed to the errors contained within the previous YAMAHA script and Ruby shows improvements over VOCALOIDs like YOHIOloid. This script is written focusing on the American accent.

Part of the reason for Ruby having a different script than CYBER DIVA is that the improved script used for CYBER DIVA hadn't been shared at that time.

The aspiration issue is also fixed in this script.

Dex/Daina Script
In June 2015, Syo also revealed he had created another English script for Zero-G Limited's two American Vocaloids Dex and Daina, which similar to Ruby's but had different lyrics.

This script also focuses on the American accent.

Cyber Songman Script
CYBER SONGMAN was recorded with a brand new phonetic script developed personally by the lead developer of the project, Michael Wilson. This new script is an update of CYBER DIVA's, and according to Wilson, tests proved that it was easier to read and pronounce, which increases the clarity in the pronounciation while mantaining a natural, expressive sound.

Notes on Accents
Despite the general belief that singers lose their accents when they sing, this is not the case and an accent is possible to be heard even in singing vocals. However, the reason many are led to believe this is that there are several methods of training singers to disguise or otherwise hide their natural accents - they may even adopt a accent that isn't their own for singing. Singing also uses different muscles to speech, resulting in difference of air pressure and way the throat moves.

Though the English language is not alone in the problems of accent as other languages may suffer from this same problem, English VOCALOIDs have proven to be difficult to avoid issues with accents. Even the first two VOCALOIDs in English, LEON and LOLA, were noted their distinctly "British" accent. The result is that the accent has been known to aid or add difficulty to the use of synthesizing software and VOCALOID is no stranger to this effect. English VOCALOIDs have ended up with the most variation on how they sound out of all the current languages offered for the VOCALOID software so far produced.

The impact of the dialect/accent on English VOCALOIDs can result in a notorious variation of certain sounds, being notorious in the case of the diphthongs and rhotic vowels. Users who are not aware of the potential difficulty of accents may overlook odd pronunciations that need to be adjusted for better results. This is true for non-native based accents voicebanks more so, due the voice provider may have pronunciation issues with a non-native language.

In some instances, Producers may be found to have adjusted VSQ and VSQx files so heavily to make them work for 1 particular English VOCALOID that they become "VOCALOID specific" and are unable to work particularly well without further adjustments on other English VOCALOIDs. Cases like this are often rare in languages such as Japanese, though not foreign to them and many VSQ and VSQx files will work without too much adjusting.

British-English Accented
British-English accented VOCALOIDs were VOCALOIDs whose provider was known to have been of "British" nationality. As Great Britain is the main origin of English, British-English VOCALOIDs sing in a native English accent. Originally, they were the standard English accent type used to develop the English engine. British accented VOCALOIDs mostly came originally from Zero-G who worked solely with British artists to collect their vocal samples from.

''Note: The term 'British' applies to anyone from England, Scotland, Wales and Northern Ireland and therefore the variation of the accent can differ greatly overall. The British Isles have the greatest variation of accents for English in the world per sq. mile of land. '' (For more information see Wikipedia.)


 * LEON
 * LOLA - (Note: though she is regarded as having a "British" accent, Lola's accent reverts to her provider's natural Caribbean accent when not singing in ideal Soul music conditions.
 * SONiKA
 * OLIVER
 * AVANNA

American-English Accented
American accented VOCALOIDs have providers that came from the United States of America, and for this they are native speakers of the English language. The most notorious difference with the British accented voicebanks is in the rhotic vowels. This is because the British dialects usually are non-rhotic; in North America rhotic dialects of the English are predominant. (For more information see Wikipedia.)


 * BIG AL
 * CYBER DIVA
 * RUBY
 * DEX
 * DAINA
 * CYBER SONGMAN

Due to the user base preference for this accent, PowerFX have confirmed since that YOHIOloid's vocal was made to have a American sounding ring to it. Hatsune Miku English also was made to match the American way of speaking by Crypton Future Media.

Australian Accented
Australian Accents are the normal English accent for individuals from Australia. This particular accent is normally very distinct compared to all other English accents, with features unique from all other English dialects. (For more information see Wikipedia.)


 * Sweet ANN - Her provider "Jody" supposedly came from Australia.

South-African Accented
South African accents are accents belonging to individuals from South Africa. English was not a native language to Africa and was introduced during the colonisation of African countries by English colonist, resulting in the English language becoming widely used in South Africa itself as the general Lingua franca between regions. Variation in impact of native languages on the English language results in a large variation of strength and tone of the accent, though in general most South African accents resemble closely to South England accents in nature. (For more information see Wikipedia.)


 * MIRIAM

Japanese-English Accented
Japanese-English accented VOCALOIDs are produced by those who came from Japan. Their voice providers have the Japanese language as their native language, but were used to produce English voicebanks. Therefore the Japanese-English accent is a non-native English accent, showing significant and notorious differences in comparison to the native English accents. As more releases of such voicebanks have been produced by studios, common traits that are clearly able to be picked out amongst these vocals.

The major issue seen with Japanese accents is that they often struggle with distinction of some sounds. This usually happens because the providers and producing studios/companies aren't familiarized with these foreign sounds. Among the most common issues are:


 * Lack of distinction and stress in vowel sounds. These ones usually are either too tense or too lax, as the speaker tends to approximate the vowel sound to their 5-vowel system.
 * Lack of distinction in the liquids consonants (R & L). Luka's use of English to pronounce the words "Road Roller", which risks coming out as sounding like "roe rorora", is the most famous case.
 * Distortion of some sounds toward similar Japanese sounds. As example, the [f] phoneme pronounced as a voiceless bilabial fricative instead a voiceless labiodental fricative, as it should be.

These traits depends of the providers efficiency in English and the experience of the studio/company with the language. Despite this, Japanese-accented English VOCALOIDs still are a better option for mimicking the English language than use purely Japanese voicebank, having the wide array of phonemes and work-arounds available from the English phonetic system.


 * Megurine Luka (Yū Asakawa is competent in speaking English)
 * Hatsune Miku (Saki Fujita did not speak English at all prior to the voicebank's recording)
 * Kaito (Naoto Fūga has an unknown level of English)
 * Meiko (Meiko Haigō has an unknown level of English)
 * Megpoid (Megumi Nakajima has a good level of English)
 * Kagamine Rin/Len (Asami Shimoda has been taking English lessons)
 * Macne Nana (Haruna Ikezawa is fluent in English)
 * Fukase (Satoshi Fukase has a good level of English)

Korean-English accented
Korean-English accented VOCALOIDs are produced by those who come from South Korea. As there is only one unreleased VOCALOID voicebank with this accent, details cannot be released.

SeeU's Korean voicebank is a special case as it was given English phonemes to mimic the language to certain degree. However, this feature was left largely incomplete due to deadline issues and again this does not produce quality results enough to comment on.


 * SeeU - An English Voicebank was set for production but is currently on hiatus as of Feb 2013.

Misc.

 * Prima - Accent unconfirmed
 * Tonio - Accent unconfirmed

Custom Dictionaries
More information on dictionaries can be found on Phoneme List.

English VOCALOIDs rely on the VOCALOID editor dictionary greatly due to the language's lack of a systematic orthography. Custom dictionaries can take advantage of the large array of English sounds found within VOCALOID to improve the way they sound, by using different combinations of sounds or by creating a accent/dialect to appear by default. This is not isolated to English vocals, but has been known to impact them greatly at times.

Be aware that the language is full of examples of homonyms that take the form of homographs (a word that has the same spelling as another word but has a different sound and a different meaning; such as "bow", "minute" and "tear") or homophones (a word that has the same sound as another word but is spelled differently and has a different meaning; such as "pair"/"pear" or "bare"/"bear") or both. Vocaloids dictionary has limitations that make such words difficult to record within it, at times users may simply have little choice but to write the word phonetically rather then lyrically.

Note that if a user creates lyrics via phonetic entry rather then written text, they will not have to consider dictionaries at all.

Megurine Luka
With the initial release of Megurine Luka, Crypton released a custom dictionary for Luka which could be downloaded from their site. This dictionary included support of Japanese characters and the names of other Crypton VOCALOIDs.

Post VOCALOID3
VOCALOID3 English vocals were given a new dictionary. This was said to "improve" the way English Vocaloids sounded.

Megpoid English
Internet Co., Ltd. provided a custom dictionary for Gumi's Megpoid English vocal. This was done to avoid certain problematic combinations that were known to the vocal. Without this script, Gumi naturally has errors that will be encountered, such as skipping of sounds or incorrect sound combinations.

Avanna
NeutrinoP made a note that Avanna has her own dictionary. This was created to make room for large arrays of accents.

CYBER DIVA
CYBER DIVA was created with a new script for VOCALOIDs. With this script, YAMAHA created a new custom dictionary for the voicebank with new words that weren't available before and more natural pronunciations.

Ruby
Including the 300 most common words, Syo confirmed that Ruby knew over 5,900 words. 100 of these words were randomly chosen. Ruby was also set up to pronounce some words such as "fire" and "hour" in one syllable.

Syo's twitter account lists many of Ruby's dictionary word adaptations and added words.

CYBER SONGMAN
CYBER SONGMAN's dictionary was an update of his counterpart's. It also makes use of his extra phonemes [4] and [@l]. While [4] was given to various third-party VOCALOIDs, the latter is currently exclusive to SONGMAN.

Phonetic System's Characteristics
There are 52 phonetic pronunciations which make up the English VOCALOID library, these phonetic inputs will use any set of the estimated 2500 samples per pitch. According to development notes on Megpoid English, there were over 4,000 phonetic connections for that particular vocal alone; a similar number is therefore likely for all English VOCALOIDs.

Vowels
The English phonetic system includes 3 types of vowels: monothongs, diphthongs and R-colored vowels. Being the nucleus of the syllable, the vowels can be encoded alone

Monothongs= Diphthongs= R-Colored Vowels=

The diphthongs and rhotic vowels tends to cause some problems for the user when they need to be extended across 2 or more notes if this one attempts to do it manually.

For work around this, the English voicebanks allows split the words in syllables across the notes using the hyphen symbol "-" within the lyrics.


 * Example: Remember split.png

while in the case of extend a syllable across various notes is required a combination of hyphen '-' and slash '/' within the lyrics for state how many note will it last.


 * Example: Sound extend V2.png

In VOCALOID2's case, is obligatory use the hyphen/slash for effectively divide the words across the notes, unless the user prefer take the risk, working around this manually using phoneme replacement.

In the case of VOCALOID3, the task is easier as the [-] phoneme allows extend any kind of vowel it follows. The hyphen/slash still works, however this one simply adds the [-] phoneme when is required.
 * Example:Sound extend V3.png

Consonants
The Phonetic System also includes 31 consonant phonemes. From the English consonants only the plosives and the liquids have their allophones as their own phonemes, these ones are required for achieving a correct stressing and pronunciation of the words.

Allophones
Plosives and their allophones= Liquids=

Phonetic List
Special note: This was the list is based in the Big Al's help file, complimented with the chart of VOCALOID-User.Net and expanded to include the IPA's symbols and names. However there were some incorrect entries within the released list. Entering some of the words provided here as examples for the phoneme usage will not result in the expected phonemes that were used for the list. In addition, the list did not indicate which particular letters the phoneme applied to; this section has underlined the relevant letters for the benefit of readers.

Additional phonetics
The following is a list of additional complementary phonemes avaible within some of the English VOCALOIDs. Most of them are allophones and it's possible to use the voicebank without having to ever touch these set of data. However, use of them within a song can improve the pronunciation and the Vocaloid's ability to sound more colloquial. In most of the cases, the data has to be entered manually through the note properties selection.

Phoneme Replacement
Due the big array of allophones and similar sounding phonemes available in the English Language, exists a great flexibility for replacing the phonemes. This has a lot of applications, like altering the emphasis or stress of a word, correcting a strange pronunciation found in a voicebank, alter the accent or general pronunciation of a particular VOCALOID, etc.

This added to some auxiliar phonemes allows a great diversity of combinations and possibilities to experiment. However, the user must consider the results may vary between the different voicebanks due the individual differences like accent, pronunciation and samples' quality present in the voicebank. The most recommended is take these tips as a guide and experiment by yourself.

Consonant Replacement= Vowel Replacement= Diphthong/Rhotic Vowel Replacement=