Phoneme List

The phonetic system forms the basis of speech play back in the Vocaloid software. Symbols used in the phoneme system are based on X-SAMPA.

Using the Phonetic System
Note: The following applies to the Vocaloid 2 system, while both programs work in a similar fashion, some things may not apply to Vocaloid or working differently to Vocaloid 2

Note; Until Vocaloid 3 is released, this page cannot be updated to take note of the upcoming Korean, Chinese and Spannish languages for Vocaloid, but will be updated as information is found.

Editing the phonemes
To create and edit phonemes, a user must right click on a note click and press "Note Properties". Here they can edit a phoneme and add additional effects through the "Note Expression Property" and the "Vibrato Property" windows. The user can press Alt key and down arrow key at the same time as a shortcut to edit the phonetic data directly. This allows the user to use the Tab key to skip to the next note and skip back to the previous one using Shift and Tab keys.

Because some phonemes are written with more than one character, such as the phonemes u: (for English) or ts (for Japanese), those need to be written separated with a space between them. If the user does not take care of this, the synthesizer will interpret all the characters as just one symbol, being unrecognized and producing no sound. Also, capitalization affects phonemes because some symbols are differentiated just by this (example: Z and z are different phonemes, so they don't produce the same result).

The Recording Process
The samples are gathered via the provider reading out a script in various keys while being recorded. The recording is then transferred to into a library which the Vocaloids will pull their results from. The libraries consist of various sounds recorded and seperated for use with the software.

For Japanese the script is much more simplier with each phonetic sample successsfully divided across the notes with little trouble. This renders each note being fairly precise.

However, for English Vocaloids, the phonetic data has to be seperated by cutting sections out of the recorded samples, because some sounds simply cannot be gathered unless they were spoken in sung as part of a word. This makes seperating sounds for the English Vocaloids much harder to do. As such, Japanese Vocaloids are often more precise then English ones on their diaphonetic sounds.

Constructing Words
The phonetic system works by taking diphong and triphonetic samples from a sample library and reassembling them in accordance to how a word would be phonetically pronounced. The word "example" would be made up of the phonetic components "ex", "am" and "ple". However, to make this word phonetically, the data needed is "ɪg", "zɑːm" and "pəl". Both Japanese and English Vocaloids alike will use the same method of arrangement for the phonetic library, with English Vocaloids requiring more diphongs and triphonetic samples than Japanese all together.

Due to to the software's musical nature, monophonetic and polyphonetics may also be needed to be considered where needed for closer vocal pitching pronunciation. The user will, however, have access only to the pronunciation at a phonetic levels and the finer levels of vocal speech adjustments that cannot be accessed currently. Many of the samples needed for both languages are also not shared by either the Japanese and English languages and only a handful of phonetic sounds are identical, or near identical, to each other.

A Vocaloid's dictionary will attempt to match the correct phonemes with the word the user enters, although not all words can be found by the dictionary. If a user allows the program to auto-find phonemes and it has a particular word that it simply cannot identify, it will automatically write it as the phoneme u: (for English) or a (for Japanese) by default.

If a user knows how words are articulated, the person can infere how to write a word that isn't in the dictionary (Ex: knowing that " bung " is represented as [ bh V N ] and "ban gle " is written as [bh { N g V l ] you can infer that "bungle" has to be written as [bh V N g V l]). In addition, if the user enters a phoneme manually that the Vocaloid simply does not have in its voicebank, there will be no sound at all when the Vocaloid is played back.

Please note that all the Vocaloids simply do not have the same phonemes, such as the breathing phonemes [br1]- [br5]. There are also some phonemes that are found only in one language, so not all of the Japanese and English Vocaloids will share the same phonemes. Also, while a Vocaloid's help guide will list the alphabet of the language, they may not include additional notes.

Due to how the sound is articulated by the synthesizer, the phoneme sounds are affected by the adjacent phonemes (a phenomenon that also occurs in the natural speaking). For that reason, the phoneme sounds do not always produce the same results; they may sound differently or weakly/strongly according to their previous/following phoneme sound. To make a consonant sound stronger than the following vowel, editing Brightness, the constant sound's Breathiness or Dynamics higher will often work on some level. Another alternative is to switch the phonemes (the affected one or the adjacent to it) with an allophone, approximantor just a similar sounding phoneme.

Using One Language To Create Another
A user can use the phoneme system to create languages from scratch, so long as it is within the Vocaloid's capabilities. Due the differences between both phonetic systems and between the individual voicebanks, there are some considerations that the user must be aware when they attempt to make a Vocaloid sing a language they aren't intended for and, being a difficult task that it may take hours to do through a trial and error process.

Anyway, regardless this, if the user is aware of the Phonology of both idioms: the original one for that voice, and the target language, the task can be easier. Even more, a user may be creative, even going so far as to invent languages of their own if they desire. Essentially, the more time a user spends working to get familiar with the phoneme system, the more they can get out of the Vocaloid program.

However, some voice are easier to work than other ones, or present some sort of advantage. A clear example is Sonika, which is regarded as one of the most potential Vocaloids to "sing in any language" due to her unique set up, or Luka that allows to switch between her English and Japanese voicebanks according to the needs of the user. Users' techniques often make surprising results, however, it is greatly influenced by how much a Vocaloid's Phonetic System has phonologically in common with that of the target language without aids of other music/audio software. Examples; Due the phonetic similarities, the Japanese Vocaloid can achieve a good level of Spanish. In the introduction of SeeU it was confirmed that the Korean language is capable of mimicking a decent amount of English due to its phonetic similairites between the two.

Differences and Considerations
Due to the set up of the Japanese Vocaloids, they are more limiting for the use of the English language, since the phonology of the Japanese language including phonemes, accents, tones, intonations, moras and assimilation's, is very different from that of the English language. As each consonant sound is always followed by inseparable vowels and consonants do not get in cluster in the Japanese language, generally each of them is pronounced weakly and not independently, except んn, sokuon and some transliterated phonemes for non-Japanese words. Because of this, some of Japanese Vocaloids’ consonant sounds slightly contain vowel sounds to be smooth and sound right in Japanese when they are connected to the following vowels.

Also, even if X-SAMPA, IPA, Latin Alphabet or the symbol transcriptions are the same, their actual pronunciations in Japanese and English are not always the same; for instance, symbol S is often pronounced /ʃ/ by English Vocaloids and /ɕ/ by Japanese ones, basically Japanese "a" is a low central vowel and is between the English "a" in "father" and the English "a" in "dad" , and "r" in Japanese is not as same as either "r" or "l" in English. (See "Japanese Phonetic System" below) 

In addition, the English language often puts emphasis on certain letters of words (stress accent) while the Japanese language frequently use pitch accents. These differences between two languages frequently make Japanese Vocaloids retain a Japanese accent when there is no perfectly equivalent phonemes, even if users manage them to sing in the correct language. On the contrary, the same things can happen to English Vocaloids and they often have English accents when they sing in other languages.

Another consideration with English Vocaloids is their regional accent. This will not affect any of the Vocaloids' overall performance or the handling of the Vocaloid engine and they will use identical Phonemes regardless. In fact, the only effect this will have on the Vocaloid is simply a particular stress or emphasis on certain vowels and consonants that may not be seen in another English Vocaloid, but may make an English Vocaloid sound not how a user expects. Examples of Vocaloids who may be affected by this include Sonika who has a British accent and Big Al who has an American; also included in this is Luka Megurine who will retain a Japanese accent. One noted example of a regional accent affecting a Vocaloid's outcome is Big Al's pronunciation of vowel sounds; he can often be harder to make sing in Japanese because of it. In contrast, Japanese Vocaloids do not have as much of a regional accent effect between them in Japanese.

Due the individual differences between the voicebanks, different approaches obtain better results than other ones. A case is that sometimes, phonemes that are not equivalent work better than equivalent ones in the target languages; for example, when Miriam sings in Japanese, [v V] /vʌ/ sound closer to the actual pronunciation of [w a] /wa/ as a Japanese particle は than [w V] /wʌ/.

For more explanations on the differences between English and Japanese Vocaloids see Language Issues.

Techniques
Due the way how the sound is articulated by the synthesizer, simulating the human speech, some phonologic phenomena of it also appear in the software (like the coarticulation). This allows to the user apply them to the software for increase the capabilities of the voicebanks.

An application of the coarticulation is combining phonemes to achieve new articulations, closer to the desired ones. An example is the alveolar flap /ɾ/, it's possible produce it adding another interfering alveolar phoneme like the /d/ or /t/, just after the the rhotic consonants phonemes ([r], for English or [4] for Japanese).

Another example of the Phonology applied to the software is the Diphthong's case. Although the Japanese Vocaloids doesn't have an di-phonic samples due the Japanese is a language of pure vowels (there is no diphthongs), it's possible achieve the diphthongs using the phonemes [j] and [w] as semivowels of the vowel phonemes [i] and [M] respectively. This can be used on the English Vocaloids too, and can be particularly useful when you attempt make them sing another language. An additional technique is the use of short notes (around 1/64 or 1/32 of length). When the note is too short the articulation will be incomplete and the sound will blend with the next note producing interesting results. Some applications are for the "Consonant + alveolar tap /ɾ/ + Vowel combinations" in the case of the Japanese Vocaloids singing in Spanish; Example: the Spanish word  Cre ma ('Cream') [' kɾe .ma] can be realized as  [k e][4 d e] [m a] with a short [k e] . The use of successive shorts notes even allows to the Japanese Vocaloids achieve a rolling R.

Besides all the tricks available in the editor, it's possible improve the pronunciation further more during the post edition. After rendering and export the WAV file, the user can edit it in any DAW or sound editor. If the pronunciation of a consonant is too soft or too strong the user can correct its volume.

Another technique that is possible to use on Vocaloids is phoneme slicing. This can be used on Japanese phonemes for Japanese Vocaloids, either in the Vocaloid software itself or the user's DAW. The length of the note is decreased or cut down, until only half the pronunciation needed for the spoken Japanese is heard (example "su" becomes "s"). However, this will affect the singing capabilities of the Vocaloid and the notes being cut have to be much longer than normal. Although this technique may be hard for new users and results in a lack of singing smoothness, it increases the chances of getting a closer match to the intended sound. This can also be applied to English capable Vocaloids. Additionally software like Vocoder software can be used to artificially create or transform Japanese or English phonetics into another language.

It is also possible to use a Vocaloid with a similar voice type to hide the flaws of the phonetic mispronunciations of another by having the two Vocaloids sing in a duet, a classic fan example that has become acknowledge as a good example amongst fans is Sonika and Luka.

Flaws in the Phonetic System
Vocaloids must have the correct diphonetic sounds to avoid sounding choppy. However, the Vocaloid system will attempt to sound out all diphonetic data assigned to the phonemes used, even if that particular sound is not needed, resulting in a Vocaloid with too many sounds becoming slightly slurry.

A natural speaker may not sound out the needed diphonetic sounds when they sing for various reasons such as a naturally slurred vocals, their localized accent, vocal disorders like stuttering or speech impediments such as a lisp. This restriction may limit the ability of a Vocaloid in regard to mimicking the language they are intended for. For example, the American English accents often involve the complete departure of the schwa vowel sound from words where it is featured. This sound is normally a prominent feature of the English language itself and present in British English accents.

In some cases, Vocaloids like the Kagamines may have missing pronunciations. When a Vocaloid has pronunciations that it cannot sound out, they will not sing anything at all; even if the phonetic data is registered by the Vocaloid engine. The current Vocaloids also cannot recreate some languages due to their different and contrasting vocal structures. This has been pointed out in regards to Sonika's claim related to "being able to speak any language". As a result of being unable to pick out sounds at diphonetic level, sometimes spelling the word as it is spelled in accordance to the dictionary of that language will not produce the correct phonetic sound results. This means the user will have to swap phonetic data until the pronunciation is correct. This is mostly noticeable with English Vocaloids and is owed to the more complex nature of the English language. On occasions, words have to be written as they sound, rather than how they are spelled.

There are also a number of known words that have been used by English capable Vocaloids that have more than one pronunciation of the word due to stress accents. However the user failed to be able to separate the correct results from what the software gave them since Vocaloid can currently only store one pronunciation of the word in its dictionary. Without knowing how to sound out the alternative pronunciation, these words can be considered a problem to non-native English speakers;
 * Wind
 * The wind blew (IPA: [ˈwɪnd])
 * you wind me up (IPA: [waɪnd])
 * Read
 * I will read the book (IPA: [riːd])
 * I read the book (IPA : [rɛd])
 * Tear 
 * You have a tear in your eye (IPA: [tɪə]; Vocaloid: [t I@])
 * The paper has a tear in it (IPA: [tɛə] ; Vocaloid: [t E@])
 * Bow
 * You must bow before royalty (IPA: [baʊ])
 * I tie a bow in my hair. (IPA: [bəʊ] or [boʊ])
 * Live
 * The show was broadcast on TV live (IPA: [laɪv])
 * I know where you live (IPA: [lɪv])

Since Japanese Vocaloids do not have to blend their words like English ones and for having just 500 diphones to use, Japanese Vocaloids can produce choppier results than English Vocaloids when trying to be used for non-Japanese words, especially very different vocal languages such as English. Additional tuning both in and outside of the Vocaloid software may also have to be applied. Often when slicing phonetic information remains ("Su" becoming "s") a small fragment of the missing phonetic sound (in the case of "su" the missing "u" sound), leaving behind awkward vocal sounds that lower the quality of a Vocaloid's results.


 * For an example of Japanese Vocaloid using English see Gumi's demo song "Fly me to the Moon". This particular demo has led many to believe that Japanese Vocaloids are capable of better English then they are.  The reality is however that this is good tuning not good English and the same flaws exist. 

However, because English Vocaloids are the opposite of their Japanese cousins, in this respect produce the opposite problems to the Japanese Vocaloids. They may attempt to match up every phonetic combination given to them within their dictionary where possible if the user does not take this into account. To prevent this, the user may have to break up their phonetic data often, enough to prevent as much of the unneeded blending as possible, avoiding whole word construction where it is most likely to appear. The word "example" where "xam" ("zɑːm") is a case where blending must take place. However, this process may leave the English Vocaloids sounding inconsistent, with smooth pronunciations that may lead to a sudden stop as they sing where they were cut off to prevent them from blending. In both cases this is why for either language, Vocaloids will end up with a Japanese or English accent in accordance to what the language of their problem is.

Also, Vocaloids have difficulty sometimes pronouncing words. Short notes are difficult for English Vocaloids. For example, Prima and Tonio struggle with the middle section of the word "together" if the middle section is too short when you spread the word out over several notes ("to-geth-er" becomes "to-g'-er" if "geth" has no room). Some Vocaloids singing results may impacted if a user does not consider this. When a Vocaloid fails to pronounce a phonetic it should be able to, there are ways around this. You can move the phonetic data onto another track, increase the "accent" (attack) in note properties, or change the length of the note to allow the vocal room to pronounce the words. VY2 also has a weakness like this, the phonetics あa with れre becomes a げge sound, but this is fixed by dividing the tracks or modifying the tone of the voice.

As noted in this section, due to the sheer number of things to take into account, English capable Vocaloids can often be potentially far more complex due to the problems presented by the English language, than the Japanese Vocaloids. Liberally interpreted, English Vocaloids have a greater language capacity than their Japanese cousins for having more vowel and clearly separated consonant sounds and are therefore easier to make sing in other languages, although both will only be using the equivalent or quasi-equivalent phonemes according to the set up of the phonetic system of either language. Japanese Vocaloids can often be far more simple to use, despite the more limited voicebank and low-quality results for non-Japanese words. Megurine Luka users must also keep in mind of all of these problems when using either of her vocal voicebanks as both voicebanks will encounter these problems. However, users enjoy the chance to switch between the voicebanks where the language capabilities allow Luka to take advantage of doing so.

English Phonetic System
The following is a list of phonemes needed to make the Vocaloid sing in English.

Special note: This was the list provided by Big Al's help file, however there were some incorrect entries within the released list. Entering some of the words provided here as examples for the phoneme usage will not result in the expected phonemes that were used for the list. In addition, the list did not indicate which particular letters the phoneme applied to; the wikia has underlined the relevant letters for the benefit of readers. Of the Japanese Vocaloids, only Luka will be able to use this system properly. There are 52 phonetic pronunciations which make up the English Vocaloid library, these phonetic inputs will use any set of the estimated 2500 diphonetic samples, (Vocaloid uses a total of approx 8,500 samples altogether for english) needed for English recreation altogether.

Japanese Phonetic System
The followings are lists of phonemes needed to make the Vocaloid sing in Japanese.

Special note: this Japanese phonetic list is taken from help file of Vocaloid2 developed by Crypton Future Media. There are 41 phonetic pronounications which make up the Japanese Vocaloid library, these phonetic inputs will use any set of the estimated 500 diphonetic samples needed for Japanese recreation (Vocaloid uses approx 1,300 samples for Japanese overall).

Additional notes

 * Crypton’s Vocaloids, including Kaito and Meiko, have almost the same Japanese phonetic system. To use z, Z, h\, N and N' , users need to edit the phonemes, not entering kana-characters.
 * Rin/Len Kagamine Act 1 can pronounce h\ while their Act 2 cannot (comparison of consonant sounds Act 1, Act 2).


 * Vocaloids of Internet Co. Ltd., such as Gackpoid or Megpoid, mostly share the same system as Crypton’s, but they do not have z and Z sounds. As is often the case with the Japanese language, they are replaced by dz and dZ.
 * Commonly h\ sound works only in h\ e(ぇ, xe) and h\ o(ぉ, xo).
 * Japanese VOCALOID2 voicebanks can combine a and i phonemes (eg. w a i) but not with the original VOCALOID voicebanks. The workaround is to simply use the y consonant. (w a j)
 * N\, N or n alone tends to be pronounced as "ng". This is the basis for Japanese vocaloids being used for South-East Asian languages.
 * N' followed by a vowel may produce odd results, however, due to its use within the Japanese language there is no actual call for this phonetic to be followed by a vowel sound anyway.
 * However, some SEA languages have a different way of pronouncing "u", which is different from the Japanese. Only Miku, Gackpoid and Iroha can pronounce "u" closer to the way SEA languages do.

Comparative Chart
Special note: this is based on Big Al's help file and some information is added to show English equivalent/quasi-equivalent phonemes for Japanese phonemes with symbols and compare their actual pronunciations. Even if the symbol transcriptions are the same, their actual pronunciations in each of the language are often different as each IPA shows. This guide is meant for users who is working to make an English/Japanese Vocaloid to sing in the opposite language. However, additional work will be needed to get closer to the target language's phoneme usage.

Additional notes

 * Linguistically, the phonemes which the English language and the Japanese language share in common are k, g, s, z, Z, tS, h, b, p, j and m. Also both English and Japanese voicebanks have e, S, dZ, d, N, n and w, however, these phonemes generally do not sound the same. (See IPA in each language)
 * Since all the voicebanks have their distinctive characteristics, their phonemes do not always produce the same result especially in languages which they are not intended for.
 * The above is particularly true for Miku and Rin, who are remarked to sound excessively aged when singing in normal configurations, higher octaves, but in another language.
 * The most of the consonants in the Japanese phonemes (and certain English phonemes) are not intended to be encoded standalone. Using them for such may sometimes result in audio distortion, clicks or sound loops.

Misc.
The following is a list of phonemes that will alter the effect of a note in a certain way.

''Special note: Not all the Vocaloids will share these particular effects. Sweet Ann, for instance, does not have the breathing phonemes. Some vocaloids, such as Kaito and Meiko, have a breathing phoneme /*in/ instead. Sonika has much more capability within her Voicebanks, but lacks the 5 breathing phonemes. Despite Prima having them, Tonio does not.''


 * Example of Breathing Phonetics in use
 * Example of rolling phonetic in use

Additional Help
Also note, both Zero-G and PowerFX also have tutorials of their own.


 * How To Make a Vocaloid Breathe Using VOCALOID: Explanation on how some of the Japanese Vocaloids sound when you use the breathing effects
 * Comparative Table of English and Japanese Phonetic System of Japanese and English Vocaloids, including notes on if the vocaloid has this phoneme. List also includes information on how to transform the quasi- equivalent phonemes in Japanese and English into the opposite language effectively.
 * Vocaphonetic: A Japanese community site for creating and distributing Japanese dictionary data for English Vocaloids to sing better in Japanese. The dictionary data for Vocaloid and Vocaloid2 are respectively available.
 * Vocaloid Phonetic Library - a quick look up guide for Phonetics of all Vocaloids.
 * From English to Japanese - Using Tonio, this is the instructions for how Japanese users can make Tonio sing in Japanese. Also shown is how close to and how much of the Japanese language Tonio can reproduce.
 * Tutorial - here you see a tutorial showing a user making Miku sing in "english" Japanese phonemes.
 * Making Big-Al sing Japanese

Trivia

 * One of the reasons for the large length of time between Vocaloid releases for english Vocaloid is owed to the length of time consumed in recording the phonetic samples (estimate; 2,500 samples needed for English vs 500 for Japanese per each pitch). It took 25 hours (4 hours a day) to record all the Kagamine "Appends". In contrast, according to Anders, it takes anything from 1-3 weeks onwards to record a single english voicebank.
 * The more samples involved in making a synthesized voice the harder it is to maintain quality and the lack of smoothness of older synthesizing software voicebanks can often reflect the difficulty it presents.
 * More complex languages such as English struggle much more to maintain quality while singing due to the sheer number of samples involved.
 * This is also why older voicebanks may be harder to use such as the vocaloid voicebanks. For instance, "now" is often pronounced as "no-ow" by the English Vocaloid voicebanks. In contrast, Vocaloid 2 voicebanks have no problems with this word.
 * Some fans struggle to understand how synethized vocals have developed over a single decade and do not understand why Vocaloid results are as they are. Here are  Microsoft Mike, Mary, Sam and Ann, speaking (mature Content) showing the various stages of this particular software and progression the vocals for the Microsoft text-to-speech voices software.  Vocaloid was released soon after this software was being developed, yet are much more advance software packages, but there are common problems shared between all synethizing software packages.