Vocaloid Wiki
! The following is a tutorial made for VOCALOID fans by fellow VOCALOID fans. !


The Spanish language has only 5 vowel sounds and 18 consonants.[1] The language also has 29 possible allophones and 841 theorically possible combinations, requiring only 521 to cover more of  the 99.99% of the concurrences within the language.[2]

Notes on Accent[]

Despite the general belief that singers completely lose their accents when they sing, this is not the case in every instance and an accent is possible to be heard even in singing vocals.

However, the reason many are led to believe this is that there are several methods of training singers to disguise or otherwise hide their natural accents - they may even adopt an accent that isn't their own for singing. Samples include genres such as western or country, black music such as Jazz or Soul. Singing also uses different muscles to speech, resulting in difference of air pressure and way the throat moves. Genres such as Opera are most likely to make a accent appear almost entirely absent thanks to the impact of the opera vibrato.[3][4]

VOCALOID will capture any form of accent quite easily at times. It depends on the recording method used on the voicer, type of sound being recorded per sample (accent impact varies per sample and language), and overall number of samples that make up the voicebank (the more samples, the more chance of it slipping in).

Spanish is a language that can be impacted by accents at times. For example, the first 3 Vocaloids, all were noted for their Castilian Spanish accents. However, while it is noteworthy that accents can exist in Spanish Vocaloids, it is not considered currently as problematic as with other languages. Generally, most Spanish Speakers seem able to understand Spanish Vocaloids fairly adeptly enough to use them with little problem.

Spanish Vocaloids[]

The following are a list of Vocaloids that use Spanish.

Phonetic System's Characteristics[]


The system includes the 5 vowels of Spanish. In comparison to other languages such as English or Korean, the system doesn't includes diphones for the diphthongs. Instead, the system includes the respective glides or semivowels of the "weak" vowels ([i] and [u]) which allows it to perform the diphthongs when combined with the corresponding vowels.


The system includes 4 glides which allows to perform all the diphthongs of the Spanish. There are 2 tyoes of glides:

  • The semivowels  [I] and [U], used for the falling diphthongs (vowel+glide).
  • The approximants [j] and [w], used for the  raising diphthongs (glide+vowel).


Weak Allophones[]

Lenition or Weakening, is a kind of sound change that alters the consonants, making them "softer" in some way. Lenition occurs especially often intervocalically (between vowels). In this position, lenition can be seen as a type of assimilation of the consonant to the surrounding vowels, in which features of the consonant that are not present in the surrounding vowels (e.g. obstruction, voicelessness) are gradually eliminated.

In the Spanish, the Lenition has been an important phenomena since the evolution from the Latin, and continues affecting some consonants, particularly the voiced plosives /b/, /d/ and /g/. Those ones in intervowel context are realized as "softer" voiced fricative or approximant allophones.

voiced stop continuant (fricative) approximant (spirant)
[b] voiced bilabial plosive [β] voiced bilabial fricative [β̞] bilabial approximant
[d̪] voiced dental plosive [ð] voiced dental fricative [ð̞] dental approximant
[g] voiced velar plosive [ɣ] voiced velar fricative [ɣ˕] velar approximant

Due this, the Spanish Phonetic system includes individual phonemes for the softer allophones. These ones are differenced of their standard "stronger" counterparts by the uppercase symbol, fitting in that way with their respective X-SAMPA's symbol for the fricatives.

The "harsher" plosives generally appears at the beginning of the words, after a nasal consonant like [m] or [n], and after a pause, while their "softer" allophones appears in all the other context, especially intervowel.

Like in the case of the English's aspirated allophones, both versions can be interchanged without alter the overall word meaning, varying only by the degree of stress and emphasis of the words. The slow speech tends to favor the "harsher" plosives while the fast speech tends to favor their "softer" allophones, as the first one has more pauses and silences that allows a full realization and articulation of the plosives while the later do not.[5]

Rhotic Consonants[]

The Spanish language is one of the few Indo-European languages which has a clear distinction of the rhotics consonants /ɾ/ alveolar tap (the "flapped D" in the American English, known as "ere" in the Spanish) and /r/ alveolar trill (Rolling R, known as "erre" in the Spanish).

The alveolar trill and the alveolar tap are in phonemic contrast word-internally between vowels but are otherwise in complementary distribution. In the Spansih orthograpy, for distinct a intervowel alveolar trill the double R (or 'rr') notation is used while a single intervowel R always is an alveolar tap. In the Spanish phonetic system, this orthographic notation was used instead the usual X-SAMPA notation, as the the alveolar tap is represented as [r] while the alveolar trill is represented as [rr] (not as [4] or [r] how they should be respectively in the X-SAMPA).


Phoneme Replacement[]

In the Spanish shows a notorious contrast at the beginning of the syllable, however at the end of the syllable (coda position) the contrast of some consonant is much less marked, making them prone to assimilation processes or merging. Knowing these ones it's possible replace some of the phonemes for the respective allophone, allowing change the stress and pronunciation without alter the meaning of the word.

Voicing Assimilation[]

Nasal Assimilation[]

In syllable-final position the nasal consonant are prone to assimilate the place of articulation of following consonant, even across a word boundary. Knowing this, it's possible replace a nasal consonant with another one more appropiate for the context of said phoneme.

  • For the word Chancho ('Pig') it may be input as [tS a J][tS o] instead [tS a n][tS o] in the VOCALOID Editor because the /n/ should be palatalized in that context due the influence of the following /tʃ/.
  • In the phrase Corazón Confundido ('Confused Heart'), it's possible to replace the [n] at the end of the first word for its velar counterpart [N] if the context allows the assimilation of the nasal consonant.
    [k o][r a][T o n][k o n][f u n][D i][D o] → [k o][r a][T o N][k o n][f u n][D i][D o]

Realization of the R[]

In coda or syllable-final position the realization of the Spanish R is neutralized, which it means this one can be realized either as flap or trill.

Phonetic List[]

Symbol Classification IPA's Symbol / Name Sample Notes Related Phonemes
[a] vowel ä open central unrounded vowel padre
[e] vowel mid front unrounded vowel enero [i] (lowered)
[i] vowel i close front unrounded vowel finca, mío

[j] (glide)

[I] (non-syllabic)

[o] vowel mid back rounded vowel foco, oído [u] (lowered)
[u] vowel u close back rounded vowel musa, dúo

[w] (glide)

[U] (non-syllabic)

[j] semivowel j palatal approximant amplio, ciudad Used in raising diphthongs (glide+vowel).

[i] (syllabic)

[I] (non-syllabic)

[j\] (fortitied)

[w] semivowel w voiced labio-velar approximant huevo, buitre Used in raising diphthongs (glide+vowel).

[u] (syllabic)

[U] (non-syllabic)

[G] (unrounded)

[I] semivowel aire, muy Used in falling diphthongs (vowel+glide).

[i] (syllabic)

[j] (glide)

[U] semivowel pausa, neutro Used in falling diphthongs (vowel+glide).

[u] (syllabic)

[w] (glide)

[p] consonant p voiceless bilabial plosive perro, apto [b] (voiced)
[t] consonant voiceless dental plosive tuyo, traba [d] (voiced)
[k] consonant k voiceless velar plosive caña, quise, kilo [g] (voiced)
[b] consonant b voiced bilabial plosive bestia, embuste, vaca, envidia At the beginning of the word or after a pause or after a nasal consonant.

[p] (voiceless)

[B] (lenited)

[B] consonant β~β̞ bilabial spirant bebé, obtuso, vivir, curva Lenited /b/. In middle of a word, in all the cases where /b/ isn't used. [b] (fortited)
[d] consonant voiced alveolar plosive dedo, cuando, aldaba At the beginning of the word or after a pause or after a nasal consonant or after /l/.

[t] (voiceless)

[D] (lenited)

[D] consonant ð~ð̞ dental spirant dedo, arder, admirar Lenited /d/. In middle of a word, in all the cases where /d/ isn't used. [d] (fortited)
[g] consonant ɡ voiced velar plosive gato, lengua, guerra At the beginning of the word or after a pause or after a nasal consonant.

[k] (voiceless)

[G] (lenited)

[G] consonant ɣ ~ ɣ˕ or ɰ velar spirant trigo, amargo, sigue Lenited /g/. In middle of a word, in all the cases where /g/ isn't used

[g] (fortited)

[w] (rounded)

[tS] consonant ʧ voiceless postalveolar affricate chancho [t] (deaffricated)
[f] consonant f voiceless labiodental fricative fase, café
[T] consonant θ voiceless dental fricative cerro, cima, zumo, paz

[D] (voiced)

[s] (seseo or th-alveolarization)

[t] (th-stopping)

[f] (th-fronting)

[s] consonant s voiceless alveolar silibant casa, xilófono [T] (ceseo; dentalized or lisped)
[x] consonant x voiceless velar fricative jamón, reloj, genero, México
[m] consonant m bilabial nasal mamá , campo, invertir Also an allophone of /n/ in front of labial consonants. [n] (delabialized)
[n] consonant n alveolar nasal nido, sin

Contains various allophones:

/n/ at the beginning of word or after a pause

/ɲ/ or /nʲ/ before palatals as /ʎ/, /ʝ/ or /ʧ/

/ŋ/ before velars as /x/, /k/, /g/ or /ɣ/

// before dentals as /d̪/, /ð/ or /t̪/

[J] (palatalized)

[m] (labialized)

[J] consonant ɲ palatal nasal ñandú, enyesar Also an allophone of /n/ in front of a palatals as /ʎ/, /ʝ/ or /ʧ/. [n] (depalatalized)
[l] consonant l alveolar lateral approximant lana, principal
[r] consonant ɾ alveolar tap caro, bravo, Amor eterno [rr] (trilled)
[rr] consonant r alveolar trill rumbo, carro, honra, alrededor, disruptivo, Azrael At the beginning of the word or after a nasal consonant, /l/, /s/ or /θ/. Intervowel only if is specified by a double R. [r] (lenited)
[L] consonant ʎ palatal lateral approximant llave, pollo

[j\] (yeísmo)


[j\] consonant ʝ voiced palatal fricative ayuno

[L] (lleísmo)

[j] (lenited)

Additional Phonetics[]

The following is a list of additional phonemes avaible for MAIKA. Although this phonetic expansion is intended mainly for Catalan, Voctro Labs suggested that with her added phonemes she would be able to achieve a decent imitation of other languages like English, Portuguese and Japanese - although disclaimed that she would not sound like a native speaker.

Aside it's potentional for imitate other languages, it's important to point out this phonetic extension also can be used for complement the Spanish language, as many of the additional sounds are allophones or variants existent in other dialects or variations of said language.

Symbol Classification IPA's Symbol / Name Sample Notes Related Phonemes
[@] vowel ə schwa

amb (CAT)

the (ENG)

Reduced vowel. [a] (fronted)
[E] vowel ɛ open-mid front unrounded vowel

mel (CAT)

egg (ENG)

It may be considered a more open and lax counterpart of /e/. [e] (tense)
[I0] vowel ɪ near-close near-front unrounded vowel

it (ENG)

English KIT vowel. It may be considered a more open and lax counterpart of /i/. [i] (tense)
[Q] vowel ɒ open back rounded vowel

soc (CAT)

lot (ENG)

It may be considered a more rounded and back counterpart of /a/.

[a] (open, centralized)

[O] (closed)

[O] vowel ɔ open-mid back rounded vowel

iode (CAT)

taught (ENG)

It may be considered a more open and lax counterpart of /o/.

[o] (tense)

[Q] (open)

[r\] consonant ɹ alveolar approximant

red (ENG)

English R.

[r] (approximant)


[L0] consonant l̠ʲ, ʎ̟ or ȴ Alveolo-palatal lateral approximant ull (CAT) A more lateralized variant of /ʎ/. [L]
[N] consonant ŋ velar nasal

sang (CAT)

king (ENG)


[ts] consonant ʦ voiceless alveolar affricate

potser (CAT)

metsu (JPN)

[dz] (voiced)

[dz] consonant ʣ voiced alveolar affricate

metzines (CAT)

tsudzuku (JPN)

[ts] (voiceless)
[dZ] consonant ʤʥ voiced postalveolar affricate

metge (CAT)

jeans (ENG)

jishin (JPN)

Allophone of of /ʝ/ and /ʎ/ in some dialects.

[tS] (voiceless)

[j\], [L] (allophone)

[S] consonant ʃɕ voiceless postalveolar sibilant

caixa (CAT)

share (ENG)

shio (JPN)

Deaffricated variation of /tʃ/ in some dialects.

Allophone of /ʝ/ and /ʎ/ in Rioplatense dialects.

[tS] (affricated)

[Z] (voiced)

[j\], [L] (allophone)

[z] consonant z voiced alveolar sibilant

onze (CAT)

zoo (ENG)

[s] (voiceless)
[Z] consonant ʒʑ voiced postalveolar sibilant

ajut (CAT)

vision (ENG)

kaji (JPN)

Allophone of /ʝ/ and /ʎ/ in Rioplatense dialects.

[S] (voiceless)

[j\], [L] (allophone)

[v] consonant v voiced labiodental fricative

viu (CAT)

vote (ENG)

[f] (voiceless)

[B] (bilabial)

[h] consonant h voiceless glottal fricative

hot (ENG)

Allophone of /s/ or /x/ in some dialects (Debuccalization)



Continued Development[]

The Spanish language is currently the least popular language of VOCALOID, seeing no known new developments since 2016 and no new releases since 2013. Only two mentioned projects are known. The first was mentioned by Wat from Crypton Future Media, Inc. in June 2016, who stated they were interested in producing a Spanish and English project in the future, though details of this project were not given.[6] VocaTone had also expressed an interest in producing a English and Spanish VOCALOID in 2016.[7]

Two of its 3 voicebanks, Bruno and Clara repeatedly fallen in last place in terms of popularity and usage among all VOCALOIDs, though its third voicebank MAIKA has at times seen itself on par with the more popular English voicebanks and lesser popular Japanese voicebanks.

It is currently unknown what the overall state of development is for this language as a result and no voicebanks have been announced to be in development as of 2019. It is the only language VOCALOID provides which has become stagnated and this has greatly impacted the development of Spanish VOCALOID as a language within VOCALOID, as well as its voicebanks.

It offers the second least selection of vocals, beating Korean VOCALOID by just 1 voicebank release.

See also[]

Conversion Lists
Interwiki articles


  1. link
  2. Jordi Bonada and Xavier Serra - Pompeu Fabra University, Music Technology Group, Ocata, 1, 08003 Barcelona, Spain: Synthesis of the Singing Voice by Performance Sampling and Spectral Models
  3. Explanation for accents in singing and also a lack of
  4. [ http://www.todayifoundout.com/index.php/2013/08/why-british-singers-lose-their-accent-when-singing/ "Why do British singers lose their accents?"]
  5. link
  6. https://www.excelsior.com.mx/hacker/2016/06/06/1097104
  7. https://vocatoneofficial.tumblr.com/post/145077501357/itd-be-cool-to-hear-a-dual-voicebank-engloid-and

External links[]