Vocaloid Wiki
Vocaloid Wiki
🛠 This subject is work in progress.
Please bear with us while improvements are being made and assume good faith until the edits are complete.
For information on how to help, see the guidelines. More subjects categorized here.
! The following is a tutorial made for VOCALOID fans by fellow VOCALOID fans. !

Language Differences[]

In phonetic terms both languages, the Japanese and the Spanish languages share various similitudes. Both languages are syllable-timed languages with a 5 vowel system.

They coincide in four of them, just having a small difference of pronunciation in their respective "u" vowel (the Spanish U is a close back rounded vowel while the Japanese U is a close back compressed vowel). For the native Spanish speakers, the Japanese U although similar to its U, tends to have an inherent weak B-like sound, similar to their /β/ voiced bilabial approximant, caused by the lips compression. In the case of the native Japanese speakers, the Spanish U is pronounced as a with the lips protruded, in the same way as the お.

The main difference in the vowels aspect is the absence of the Diphthong in the Japanese Language, being one of the aspect that need take care when you attempt to make a Japanese Vocaloid sing in Spanish. While in the case of the Spanish there isn't the marked patalization phonemes of the Japanese.

In the aspect of the consonants, although both languages don't share the same consonant phonemes, many of them are perceived as allophones of the other language, or at least are the enough similar to be used as them. The main difference in this aspect is the realization of the liquid consonants in both language (remember the Japanese language has an undefined liquid consonant), and the absence of the Rolling R in the Japanese Phonetic System.

Is important stand out due the Japanese Language has a CSV (Consonant-Semivowel-Vowel) syllable configuration which tends to favors the CV syllables, its Phonetic system is intended to be encoded ideally with ["Consonant" "Vowel"] syllables/notes. For this reason when a user attempts to use a Japanese Vocaloid sing in another language can occur, depending of the the voicebank utilized, this one will struggle with consonants in coda position (at the end of the syllable/note) or with some consonant clusters, causing those ones will be weakly pronounced or even not pronounced at all (Example: GUMI & Rin ACT1). This also affects the way of work and can alter the pronunciation of some consonants combinations used to mimick the target language. Anyway this problem has been resolved for Vocaloid3 with the addition of new phonemes.

Japanese to Spanish[]



One of the things that more work requires when a Japanese Vocaloid sings in Spanish is the Diphthong. The Japanese Language often is referred as a pure vowel language, without diphthongs or glides (unless you consider the palatalized consonants as glides). This added to the problem of the choppy vowel combination of some voicebanks causes the user will need work around this for achieve a smooth pronunciation, requiring use some techniques.

Use Glides[]
Use a Blending Phonemes[]
Use Palatalized Phonemes[]

Phonologic note: Hiatus that aren't Hiatus[]

Lexically talking , the Spanish classifies its 5 vowels in two groups: the "strong vowels" (a,e,o) and "the weak vowels" (i,u). The diphthong only occurs between an strong vowel and a weak vowel, or between two weak vowels, causing the weak vowel becomes a semivowel or glide.

In the practice there also occurs diphthongs between the the "strong vowels" of the Spanish. This occurs often in the fast speech and generally the most affected vowels are the [e] and [o], which become non-sibilant vowels. When this occurs, the [e] and [o] tend get some characteristic of a [i] and [u] respectively.

The word poeta ('poet' IPA:[ˈpoeta]) is a 3 syllables word [ˈpo.e.ta]. However isn't rare that in the colloquial speech or fast speech is realized as a 2 syllables word [ˈpo̯e.ta], when this occurs the non-syllabant [o̯] can be replaced unconsiently by a [w̝] or a raised voiced labiovelar approximant (which is a semivowel with a pronunciation between a [w] and a [o̯])

Knowing this, is possible apply the glides [j] and [w] for the case of th [e] and [o] respectively (probably it going to be need to adjust the accent or attack of the glides to get a smooth pronunciation).

Liquid Consonant[]

The Japanese R (represented as [4] in the Vocaloid Phonetic System) is an undefined liquid consonant with various rothic and lateral allophones, often varies between a /ɺ/  alveolar lateral flap and a /ɽ/ alveolar retroflex flap. The native Spanish speakers perceives it as an intermediate sound to their /ɾ/ alveolar tap or "ere" (rhotic liquid consonant) and their /l/ alveolar lateral approximant or "ele" (lateral liquid consonant). Due this, is necessary work the phoneme [4], to achieve a better distinction between both Spanish liquid consonants.

In the case of the alveolar tap is possible intercalate an alveolar plosive ([d], [t]) between the [4] phoneme and its vowel. This makes the sound of the [4] harsher.

the Spanish word aro 'ring' ['a.ɾo], can be typed as [a][4 d o] or [a][4 t o]. If the user types the [4] alone, this means [a][4 o] then the word will tend to sound closer to a L than a flapped R, making it sound as halo 'halo' ['a.lo]

In the case of lateral approximant, generally the phoneme [4] alone works well as a Spanish [l]. Anyway, is possible utilize the Gemination Techniques for extend a bit the phoneme [4], giving it a more lateral release if is required.


Some vocaloids tends to pronounce weakly the consonants at the end of a syllable. If you're attempting do the word algo 'something' ['al.ɰo] is possible that typing it as [a 4][g o] can generate a weak or short sounding L sound. If this is the case the user can type the word as [a][l e_0][g o]'.

Consonant Clusters[]

Now, for the consonant clusters, in the Spanish language the are few consonant clusters. Of the few ones, one of the most common are the clusters of a consonant followed by liquid consonant, which occurs at the beginning of a syllable.[1] The initial consonant can be /b/, /k/, /d/ ( this one always followed by the liquid consonant /ɾ/), /f/, /g/, /p/ and /t/. Due their nature those can be a kinda difficult needing to be worked carefully to achieve a good pronunciation.

The first alternative is type directly both consonants in the same note, however doing this causes that the sound of the [4] will sound closer to an /l/, which makes this method more suitable for the consonants cluster formed by a consonant followed by an /l/.

Example: For the name Clara ['kla.ɾa] you can type the word as [k 4 a][4 d a].

Another possibility is separate the cluster in two syllables, realizing the first consonant as a really short syllable containing a devoiced vowel, followed by a the syllable containing the liquid consonant. Following the previous example the name Clara can be realized as a ['ka.la.ra] with a short [ka] (This trick is quite used by audiologists to correct the speech problems related with the consonant clusters). This can be applied to the synthesizer, however doing this tends to make the sound of the [4] harsher making it sound closer to a /ɾ/, for this reason this trick tends to work better for the consonant clusters formed by a consonant followed by a /ɾ/.

Example: Crema 'Cream' ['kɾe.ma] can be typed as [k e][4 e][m a] with short [k e].

Rolling R[]

The alveolar trill or Rolling R isn't a natural phoneme in the Japanese Language. Despite this the native Japanese Speakers occasionally produce it, realizing their own R as a trill. This phenomenon is called 'rolled tongue' (巻き舌 makijita) and generally is used as a vulgar or derogatory nuance in speech.

Exists various technique, developed in parallel, among the Japanese and Spanish users for achieve this articulation using a Japanese Vocaloid. Despite the differences basically all aims to the same: the use of successive short syllables containing the phoneme [4]. When is doing correctly the successive short flaps will blend in a thrill, generating a Rolling R.

The Japanese users for achieve this effect usually uses the technique of add a short note containing successive ル ru (around 3 to 5) in front the flap the they wishes to turn in a trill.[2][3]


The use of short ru syllables seems to stem from the onopatopeya ルルルル (rurururu) which usually is used to represent the rolling R. Although it works well, the use of the phoneme [M] can cause a "u-colored" sound in the Rolling R, sounding funky in certain occasions. When this occurs, is possible replace the [M] vowel for the vowel which accompanies the trill, sounding more natural in certain occasions. Example: if your rolling ra sounds weird, replace the successive ru repetitions for successive ra repetitions.

Instead the Spanish user generally uses short note repetitions of the alveolar tap [4 d "vowel"] for achieve the same efect.


Now, in the case of a trill at the end of a syllable, is possible the vowel of the short repetitions can be heard. To avoid this simply the last repetition must end in consonant or the vowel can be silenced using the Dynamics (DYN).

Is important take attention of the words that contains a trill next to a [d] phoneme (Ej: Dardo IPA:['därðo]). Occurs the [d] tends to blends with the trill in a similar way how the [4] blends with the [d] in the case of the alveolar tap.

Due this technique uses short notes for get the wished effect, it's strongly affected by the tempo. For this reason it's necessary adjust the Velocity (VEL) and the note length to get a good pronunciation.

Consonant Length[]

A common problem reported by the Spanish users is that some consonants at the end of the syllable are too short. This is due the Spanish speakers tends to extend a bit certain consonants at the end of the syllables, particularly the nasals like /n/ and /m/, the /l/, the silibants as /s/ and the /θ/.

For fix this the user can add a short syllable containing the intended consonant to extend with vowel [e] Then for get rid of the [e] vowel the user can silence it dropping the Dynamics (DYN) to 0. If the length isn't enough the user then can use the Gemination Techniques used by the Japanese users.

Working Conflictive Combinations[]

Conversion List[]

Sample Spanish Symbol IPA for Spanish Symbol Equivalent / Semiequivalent Japanese Symbol IPA for Japanese Symbol
padre [a] ä [a]


enero [e] [e]

finca, mío [i] i [i] i
foco, oído [o] [o]
musa, dúo [u] u [M]1 ɯᵝ
amplio, ciudad [j] j [j]1 j
huevo, buitre [w] w [w]1 wᵝ
aire, muy [I]





pausa, neutro [U]





bestia, embuste, vaca, envidia [b] b [b] b
bebé, vivir, curva [B] β [b] b
chaleco [tS] ʧ


[S] (deaffricated)2


ɕ or ʃʲ

dedo, cuando, aldaba [d]


[d'] (before i)


dedo, arder [D] ð


[d'] (before i)


fase, café [f] f


[p\'] (before i)


ɸʲ or fʲ

gato, lengua, guerra [g] ɡ


[g'] (before i)


trigo, amargo, sigue [G] ɣ


[g'] (before i)


jamón, reloj, genero, México [x] x


[C] (before i)



caña, quise, kilo [k] k


[k'] (before i)


lana, principal [l] l


[4'] (before i)



llave, pollo [L] ʎ




[S] (Rioplatense)



ʑ or ʒʲ

ɕ or ʃʲ

ayuno [j\] ʝ

[j] [dZ]


[S] (Rioplatense)

j ʥ

ʑ or ʒʲ

ɕ or ʃʲ

mamá , campo, invertir [m] m [m] m
nido, sin [n] n

[n] (at the begin of the words)

[N] (before velars)

[J] (before palatals)



ɲ or nʲ

ñandú, enyesar [J] ɲ



ɲ or nʲ


perro, apto [p] p


[p'] (before i)


caro, bravo, amor eterno [r] ɾ

[4 d] or [4' d]


rumbo, carro, honra, alrededor, disruptivo, Azrael [rr] r

[4] (short repetitions)

[4 d] or [4' d] (short repetitions)

casa, xilófono [s] s


[z] or [dz] (voiced)10


z or ʣ

cerro, cima, zumo, paz [T] θ

[dz h] or [C dz]11

[s] (Latin American)

[s'] (before i)





tuyo, traba [t]


[t'] (before i)



1^ Revise the Diphthong section for more information.
2^ The pronunciation is closer to a /ʃ/ Voiceless palato-alveolar sibilant. This phoneme replaces to the /ʧ/ in some dialects.
3^ The phoneme [p\] occasionally tends to sound closer to a [h] than a [f] (specially when is followed by a [M]). When this occurs, the user can utilize the palatalized version of the phoneme, taking advantage of its stronger pronunciation.
Among the possibilities the user can add the [p\] before its standard version for strengthen its pronunciation (Example: instead of [p\ M] type [p\' p\ M]); Do a short fyu (Example: [p\' M][M]); or use a devoiced vowel (Example: [p\' M_0][M]).

4^ Used for achieve a [hu] .This trick only works if the Vocaloid doesn't pronounces the combination ou [o M] in a choppy way. In Vocaloid3 the user can use a devoiced [o] or [o_0].
5^ The devoiced [e] phoneme or [e_0] is only available in Vocaloid3. For the Vocaloid2's users is possible achieve a similar effect doing a brief [4 e] note and then muting the [e] using the Dynamics (DYN) parameter. In case the [e] doesn't sounds good the vowel can be replaced by the previous or the following vowel to the note.
6^ Use in case the [j "vowel"] syllable sounds to soft. When is doing correctly, the short [dZ i] will blend with the next [j "vowel"] syllable, generating an intermediate pronunciation.
7^ The pronunciation is stronger thant the phoneme [J], however some Vocaloids struggles combining this phoneme with a vowel different to [i], generating sounds clips or getting weird pronunciation.
8^ The intended effect is blend the succesive shorts flaps into a trill. Revise the Rolling R section for more information
10^ Usually for voicing assimilation phenomena. This one also can be used as replacement when the [s] is lisped. Don't forget [z] usually is limited to [e] and [o], so use [dz] if required
11^ The idea is get a breathy [dz] phoneme for achieve a closer pronunciation to the /θ/ voiceless dental fricative. Which of the both combinations works better depends of the Vocaloid's pronunciation.

Spanish to Japanese[]


Conversion List[]

Sample Hiragana / Kunrei-shiki Romaji Japanese Symbol IPA Symbol Equivalent / Semiequivalent Japanese Symbol IPA for Spanish Symbol
あ a a ä open central unrounded vowel [a]
い i i i close front unrounded vowel




う u M ɯᵝ close back compressed vowel



え e e mid front unrounded vowel [e]
お o, を o mid back rounded vowel [o]
ka, くku, けke, こko k k voiceless velar plosive [k]
ki, きゃkya, きゅkyu, きぇkye, きょkyo k'

[k j]

[k i] (short note)

ga, ぐgu, げge, ごgo g g voiced velar plosive



gi , ぎゃgya, ぎゅgyu, ぎぇgye, ぎょgyo g'

[g j]

[g i] (short note)

[G j]

[G i] (short note)

ga, ぐgu, げge, ごgo, ん n-n' N ŋ velar nasal

[n G]

[n g]


き゜gi , き゜ゃgya, き゜ゅgyu, き゜ぇgye, き゜ょgyo, ん n-n' N' ŋʲ

[n G_0 i] (short note)

[G j]

[G i]

sa, すsu, せse, そso, すぃsi s s voiceless alveolar sibilant [s]
shi, しゃsha, しゅshu, しぇshe, しょsho S ɕ or ʃʲ voiceless alveolo-palatal sibilant


[s j]

[s i] (short note)

zu, ぜze, ぞzo z z voiced alveolar sibilant



じゅju, じぇje, じょjo, じゃja, じji Z ʑ or ʒʲ voiced alveolo palatal sibilant



za, ずzu, づzu, ぜze, ぞzo, じゃja, じji, じゅju, じぇje, じょjo dz ʣ voiced alveolar affricate

[D s]

ji, ぢji, じゃja, じゅju, ぢぇje, じょjo dZ ʥ voiced alveolo-palatal affricative


D j\

ta, てte, とto, とぅtu t t voiceless alveolar plosive t
てぃti, てゅtyu t'

t j

t i (short note )

tsu, つぁtsa, つぃtsi, つぇtse, つぉtso ts ʦ voiceless alveolar affricate t s
chi, ちゃcha, ちゅchu, ちぇche, ちょcho tS ʨ voiceless alveolo palatal affricate tS
da, どぅdu, でde, どdo d d voiced alveolar plosive



でぃdi, でゅdyu d'

[d j]

[d i] (short note )

[D j]

[D i] (short note )

na, ぬnu, ねne, のno, ん n n n alveolar nasal n
ni, にゃnya, にゅnyu, にぇnye, にょnyo J ɲ or nʲ palatal nasal


[n j]

[n i] (short note)

ha, へhe, ほho  h h voiceless glottal fricative [x]
xa, ぃxi, ぅxu, ぇxe, ぉxo h\ ɦ voiced glottal fricative [x]
hi, ひゃhya, ひゅhyu, ひぇhye, ひょhyo C ç voiceless palatal fricative

[x j]

[x i] short note

fu, ふfwa, ふfe, ふfo p\ ɸ voiceless bilabial fricative [f]
ふぃfi, ふゃfya, ふゅfyu, ふぇfye, ふょfyo, p\' ɸʲ

[f j]

[f i] (short note )

ba, ぶbu, べbe, ぼbo  b b voiced bilabial plosive



bi, びゃbya, びゅbyu, びぇbye, びょbyo  b'

[b j]

[b i] (short note )

[B j]

[B i] (short note)

pa, ぷpu, ぺpe, ぽpo  p p voiceless bilabial plosive [p]
pi, ぴゃpya, ぴゅpyu, ぴぇpye, ぴょpyo p'

[p j]

[p i] (short note )

ma, むmu, めme, もmo m m bilabial nasal [m]
mi, みゃmya, みゅmyu, みぇmye, みょmyo m'

[m j]

[m i] (short note)

ya, ゆyu, よyo, いぇye j j palatal approximant





ra, るru, れre, ろro 4 ɽ retroflex flap



rr (rolled)

ri, りゃrya, りゅryu, りょryo 4' ɾʲ

r j

r i (short note )

l j

l i (short note )

rr j

rr i (short note )

wa, うぃwi, うぇwe, うぉwo w w͍ or wᵝ Compressed labio-velar approximant


ん n N\ ɴ uvular nasal n


Japanese to Spanish[]

Mujer contra Mujer[]


Cover Work by Giuseppe
Song of Mecano
Originally sung by Ana Torroja, Cover by GUMI

Tu Tic Tac[]


Music and Lyrics by Ankari
Sung by GUMI

Spanish to Japanese[]



Cover Work by Giuseppe
Song of Mikuru396
Originally sung by Hatsune Miku. Cover by Clara, backup by Bruno

Ponyo JP ver.[]


Cover Work by BlancaNegra
Illustration by Mydri
Originally sung by Fujioka Fujimaki and Nozomi Ōhashi. Cover sung by Clara and Bruno

A cover of the main theme of the movie Ponyo (崖の上のポニョ Gake no Ue no Ponyo).

Japanese Vocaloid achieving the Rolling R[]

マイリスダメー!/ MyList dame! (Don't My list Me)[]


Music and Lyric by Live-P
Sung by Kagamine Rin

During the whole song rolls the R, this particularly notorious in the "ru" and "ra" syllables.

おひめさまになりたいのッ! / Ohime-sama ni naritai no! (I want to be a princess!)[]


Music and Lyric by OSTER project
Sung by Kagamine Rin

At the start of the song Rin says: "¡Arriba!" with a marked and prolonged rolling R.