Vocaloid Wiki
! The following is a tutorial made for VOCALOID fans by fellow VOCALOID fans. !

About[]

The Spanish language has only 5 vowel sounds and 18 consonants.[1] The language also has 29 possible allophones and 841 theorically possible combinations, requiring only 521 to cover more of  the 99.99% of the concurrences within the language.[2]

Notes on Accent[]

Despite the general belief that singers completely lose their accents when they sing, this is not the case in every instance and an accent is possible to be heard even in singing vocals.

However, the reason many are led to believe this is that there are several methods of training singers to disguise or otherwise hide their natural accents - they may even adopt an accent that isn't their own for singing. Samples include genres such as western or country, black music such as Jazz or Soul. Singing also uses different muscles to speech, resulting in difference of air pressure and way the throat moves. Genres such as Opera are most likely to make a accent appear almost entirely absent thanks to the impact of the opera vibrato.[3][4]

VOCALOID will capture any form of accent quite easily at times. It depends on the recording method used on the voicer, type of sound being recorded per sample (accent impact varies per sample and language), and overall number of samples that make up the voicebank (the more samples, the more chance of it slipping in).

Spanish is a language that can be impacted by accents at times. Because it is the most spoken language amongst more than 20 territories all across the world, a mix of Old Spanish (especially Andalusian Spanish) and the local indigenous languages mixed and morphed into multiple distinct dialects depending on the region and culture. However, that does not directly translate to all of them ever being represented on the VOCALOID engine, since of the officially released and/or cancelled Spanish VOCALOIDs, all of them have very distinct Castilian Spanish accents; only MAIKA contained a phonetic library diverse enough to mimic a more generalized-sounding Spanish. However, while it is noteworthy that accents can exist in Spanish VOCALOIDs, it is not considered currently as problematic as with other languages. Generally, considering that most non-European Spanish dialects (especially Latin American Spanish dialects) only have minor phonological differences with Castilian Spanish (mainly seseo and yeísmo being the biggest differences), most Spanish speakers seem to be able to understand Spanish VOCALOIDs fairly adeptly enough to use them with little problem.

Spanish Vocaloids[]

The following are a list of VOCALOIDs that use Spanish.

Characteristics[]

Vowels[]

The base VOCALOID script for Spanish contains all of its 5 basic vowels, [a], [e], [i], [o], and [u].

Glides[]

Spanish has a very deliberate distinction between its "open" and "close" vowels. Open vowels ([a], [e], [o]) are considered "strong" and are unable to "share" a syllable, meanwhile close vowels ([i], [u]) are considered "weak" and are able to be paired with a "strong" vowel in one syllable, as long as the syllable stress doesn't fall on the "weak" one. The pronunciation of these is dependant on their placement, since they're considered semi-consonants if their syllabic placement is at the start ([j], [w]), while they're semi-vowels if it's in the end ([i̯], [u̯]). In comparison to other languages such as English or Korean, VOCALOID doesn't encode glides as diphthongs. Instead, the system includes two pairs of phonemes that are used according to the reduction in the syllable.

  • The approximants [j] and [w] are used for raising diphthongs (glide+vowel, [j]/[w]).
  • The semivowels [I] and [U] are used for falling diphthongs (vowel+glide, [i̯]/[u̯]).

Consonants[]

Sonorization[]

Lenition is a kind of phonetic alteration that affects the pronunciation of consonants, making them softer. Lenition occurs especially often intervocalically (between vowels). In this position, lenition can be seen as a type of assimilation of the consonant to the surrounding vowels, in which features of the consonant that are not present in the surrounding vowels (e.g. obstruction, voicelessness) are gradually eliminated.

In Spanish, lenition is mainly observed in the voiced plosives [b], [d] and [g] which are directly affected by sonorization, which is described as a change in voicing, approximation, and vocalization, where they morph into [β], [ð], and [ɣ] in most instances ([b], [d] and [g] are pronounced as voiced stops only after a pause or a nasal consonant, and in the case of [d], after a lateral consonant).

voiced stop continuant (fricative) approximant (spirant)
[b] voiced bilabial plosive [β] voiced bilabial fricative [β̞] bilabial approximant
[d̪] voiced dental plosive [ð] voiced dental fricative [ð̞] dental approximant
[g] voiced velar plosive [ɣ] voiced velar fricative [ɣ˕] velar approximant

Because of this, VOCALOID's phonetic system includes individual phonemes for their softer allophones, using the standard SAMPA notation of [B], [D], and [G].

Like in the case of English's aspirated allophones, both versions can be interchanged without altering the overall meaning of the word, varying only by the degree of stress and emphasis of the words. Slow singing tends to favor the "harsher" plosives while fast singing tends to favor their "softer" allophones, as the first one contains a pause required for the realization of the plosive, while the latter does not.[5]

Alveolar Consonants[]

The Spanish language is one of the few Indo-European languages which has a clear distinction of the [ɾ] alveolar tap (known as the "flapped D" in American English, known as "ere" in Spanish) and the [r] alveolar trill (known as the "rolled R" in American English, known as "erre" in Spanish). These are represented by the standard SAMPA notation of [r] for the tap, and [rr] for the trill.

The alveolar trill and the alveolar tap are in phonemic contrast word-internally between vowels but are otherwise in complementary distribution. In the Spanish orthography, for a distinct intervowel alveolar trill, the double R (or 'rr') notation is used while a single intervowel R is always an alveolar tap. In the Spanish phonetic system, this orthographic notation was used instead of the usual X-SAMPA notation, as the alveolar tap is represented as [r] while the alveolar trill is represented as [rr] (not as [4] or [r] how they should be respectively in the X-SAMPA).

Techniques[]

Phonetic Replacement (this section needs help!)[]

The VOCALOID model for word to phoneme conversion is quite literal, causing some subtle phonetic alterations to be missed. These can be manually edited by the user to improve pronunciation in some cases, or to have an "alternate take" of the sung word.

Spanish shows a notorious contrast at the beginning of the syllable; however, at the end of the syllable (coda position), the contrast of some consonants is much less marked, making them prone to assimilation processes or merging. Knowing these ones, it's possible to replace some of the phonemes for the respective allophones, allowing to change the stress and pronunciation without altering the meaning of the word.

Voicing Assimilation[]

Nasal Assimilation[]

In syllable-final position, the nasal consonants are prone to assimilate the place of articulation of the following consonant, even across a word boundary. Knowing this, it's possible to replace a nasal consonant with another one more appropiate for the context of said phoneme.

Examples:
  • For the word Chancho ('Pig'), it may be input as [tS a J][tS o] instead of [tS a n][tS o] in the VOCALOID Editor, because the /n/ should be palatalized in that context due to the influence of the following /tʃ/.
  • In the phrase Corazón Confundido ('Confused Heart'), it's possible to replace the [n] at the end of the first word with its velar counterpart [N] if the context allows the assimilation of the nasal consonant.
    [k o][r a][T o n][k o n][f u n][D i][D o] → [k o][r a][T o N][k o n][f u n][D i][D o]

Realization of the R[]

In coda or syllable-final position, the realization of the Spanish R is neutralized, meaning this one can be realized either as a flap or a trill.

Seseo[]

Yeísmo/Sheísmo/Zheísmo[]

"S" aspiration[]

Consonant deletion[]

Neutralization[]

Phonetic List[]

Symbol Classification IPA's Symbol / Name Sample Notes Related Phonemes
[a] vowel ä open central unrounded vowel padre
[e] vowel mid front unrounded vowel enero [i] (lowered)
[i] vowel i close front unrounded vowel finca, mío

[j] (glide)

[I] (non-syllabic)

[o] vowel mid back rounded vowel foco, oído [u] (lowered)
[u] vowel u close back rounded vowel musa, dúo

[w] (glide)

[U] (non-syllabic)

[j] semivowel j palatal approximant amplio, ciudad Used in raising diphthongs (glide+vowel).

[i] (syllabic)

[I] (non-syllabic)

[j\] (fortitied)

[w] semivowel w voiced labio-velar approximant huevo, buitre Used in raising diphthongs (glide+vowel).

[u] (syllabic)

[U] (non-syllabic)

[G] (unrounded)

[I] semivowel aire, muy Used in falling diphthongs (vowel+glide).Doesn't follow standard SAMPA notation (should be [j]).

[i] (syllabic)

[j] (glide)

[U] semivowel pausa, neutro Used in falling diphthongs (vowel+glide).Doesn't follow standard SAMPA notation (should be [w]).

[u] (syllabic)

[w] (glide)

[p] consonant p voiceless bilabial plosive perro, apto [b] (voiced)
[t] consonant voiceless dental plosive tuyo, traba [d] (voiced)
[k] consonant k voiceless velar plosive caña, quise, kilo [g] (voiced)
[b] consonant b voiced bilabial plosive bestia, embuste, vaca, envidia At the beginning of the word or after a pause or after a nasal consonant.

[p] (voiceless)

[B] (lenited)

[B] consonant β~β̞ bilabial spirant bebé, obtuso, vivir, curva Lenited /b/. In middle of a word, in all the cases where /b/ isn't used. [b] (fortited)
[d] consonant voiced alveolar plosive dedo, cuando, aldaba At the beginning of the word or after a pause or after a nasal consonant or after /l/.

[t] (voiceless)

[D] (lenited)

[D] consonant ð~ð̞ dental spirant dedo, arder, admirar Lenited /d/. In middle of a word, in all the cases where /d/ isn't used. [d] (fortited)
[g] consonant ɡ voiced velar plosive gato, lengua, guerra At the beginning of the word or after a pause or after a nasal consonant.

[k] (voiceless)

[G] (lenited)

[G] consonant ɣ ~ ɣ˕ or ɰ velar spirant trigo, amargo, sigue Lenited /g/. In middle of a word, in all the cases where /g/ isn't used

[g] (fortited)

[w] (rounded)

[tS] consonant ʧ voiceless postalveolar affricate chancho [t] (deaffricated)
[f] consonant f voiceless labiodental fricative fase, café
[T] consonant θ voiceless dental fricative cerro, cima, zumo, paz

[D] (voiced)

[s] (seseo or th-alveolarization)

[t] (th-stopping)

[f] (th-fronting)

[s] consonant s voiceless alveolar silibant casa, xilófono [T] (ceseo; dentalized or lisped)
[x] consonant x voiceless velar fricative jamón, reloj, genero, México
[m] consonant m bilabial nasal mamá , campo, invertir Also an allophone of /n/ in front of labial consonants. [n] (delabialized)
[n] consonant n alveolar nasal nido, sin

Contains various allophones:

/n/ at the beginning of word or after a pause

/ɲ/ or /nʲ/ before palatals as /ʎ/, /ʝ/ or /ʧ/

/ŋ/ before velars as /x/, /k/, /g/ or /ɣ/

// before dentals as /d̪/, /ð/ or /t̪/

[J] (palatalized)

[m] (labialized)

[J] consonant ɲ palatal nasal ñandú, enyesar Also an allophone of /n/ in front of a palatals as /ʎ/, /ʝ/ or /ʧ/. [n] (depalatalized)
[l] consonant l alveolar lateral approximant lana, principal
[r] consonant ɾ alveolar tap caro, bravo, Amor eterno [rr] (trilled)
[rr] consonant r alveolar trill rumbo, carro, honra, alrededor, disruptivo, Azrael At the beginning of the word or after a nasal consonant, /l/, /s/ or /θ/. Intervowel only if is specified by a double R. [r] (lenited)
[L] consonant ʎ palatal lateral approximant llave, pollo

[j\] (yeísmo)

[j]

[j\] consonant ʝ voiced palatal fricative ayuno Doesn't follow standard SAMPA notation (should be [jj]).

[L] (lleísmo)

[j] (lenited)

Additional Phonetics[]

The following is a list of additional phonemes available for MAIKA. Although this phonetic expansion is intended mainly for Catalan, Voctro Labs suggested that with her added phonemes she would be able to achieve a decent imitation of other languages like English, Portuguese and Japanese - although said that she would not sound like a native speaker.

Aside its potentional for the imitation of other languages, it's important to point out this phonetic extension also can be used for complementing the Spanish language, as many of the additional sounds are allophones that were missing originally or are used in other dialects.

Symbol Classification IPA's Symbol / Name Sample Notes Related Phonemes
[@] vowel ə schwa

amb (CAT)

the (ENG)

Reduced vowel. [a] (fronted)
[E] vowel ɛ open-mid front unrounded vowel

mel (CAT)

egg (ENG)

It may be considered a more open and lax counterpart of /e/. [e] (tense)
[I0] vowel ɪ near-close near-front unrounded vowel

it (ENG)

English KIT vowel. It may be considered a more open and lax counterpart of /i/. [i] (tense)
[Q] vowel ɒ open back rounded vowel

soc (CAT)

lot (ENG)

It may be considered a more rounded and back counterpart of /a/.

[a] (open, centralized)

[O] (closed)

[O] vowel ɔ open-mid back rounded vowel

iode (CAT)

taught (ENG)

It may be considered a more open and lax counterpart of /o/.

[o] (tense)

[Q] (open)

[r\] consonant ɹ alveolar approximant

red (ENG)

English R.

[r] (approximant)

[w]

[L0] consonant l̠ʲ, ʎ̟ or ȴ Alveolo-palatal lateral approximant ull (CAT) A more lateralized variant of /ʎ/. [L]
[N] consonant ŋ velar nasal

sang (CAT)

king (ENG)

Not considered under the standard SAMPA notation, but vital for certain allophones (mainly the [n g], [n k], and [n x] clusters).

[n]

[ts] consonant ʦ voiceless alveolar affricate

potser (CAT)

metsu (JPN)

[dz] (voiced)

[dz] consonant ʣ voiced alveolar affricate

metzines (CAT)

tsudzuku (JPN)

[ts] (voiceless)
[dZ] consonant ʤʥ voiced postalveolar affricate

metge (CAT)

jeans (ENG)

jishin (JPN)

Allophone of of /ʝ/ and /ʎ/ in some dialects.

[tS] (voiceless)

[j\], [L] (allophone)

[S] consonant ʃɕ voiceless postalveolar sibilant

caixa (CAT)

share (ENG)

shio (JPN)

Deaffricated variation of /tʃ/ in some dialects.

Allophone of /ʝ/ and /ʎ/ in Rioplatense dialects.

Used for loanwords from English and other languages.

[tS] (affricated)

[Z] (voiced)

[j\], [L] (allophone)

[z] consonant z voiced alveolar sibilant

onze (CAT)

zoo (ENG)

[s] (voiceless)
[Z] consonant ʒʑ voiced postalveolar sibilant

ajut (CAT)

vision (ENG)

kaji (JPN)

Allophone of /ʝ/ and /ʎ/ in Rioplatense dialects.

[S] (voiceless)

[j\], [L] (allophone)

[v] consonant v voiced labiodental fricative

viu (CAT)

vote (ENG)

[f] (voiceless)

[B] (bilabial)

[h] consonant h voiceless glottal fricative

hot (ENG)

Allophone of /s/ or /x/ in some dialects (Debuccalization)

[s]

[x]

Continued Development[]

The Spanish language is currently the least popular language of VOCALOID, seeing no known new developments since 2016 and no new releases since 2013. Only two mentioned projects are known. The first was mentioned by Wat from Crypton Future Media, Inc. in June 2016, who stated they were interested in producing a Spanish and English project in the future, though details of this project were not given.[6] VocaTone had also expressed an interest in producing a English and Spanish VOCALOID in 2016.[7]

Two of its 3 voicebanks, Bruno and Clara, repeatedly have fallen in last place in terms of popularity and usage among all VOCALOIDs, though its third voicebank MAIKA has at times seen itself on par with the more-popular English voicebanks and lesser-popular Japanese voicebanks.

It is currently unknown what the overall state of development is for this language, and no voicebanks have been announced to be in development as of 2024. Next to Korean, Spanish is one of the few languages that VOCALOID provides that has become unsupported in the VOCALOID6 multilingual feature.

It offers the second least selection of vocals, beating Korean VOCALOID by just 1 voicebank release.

See also[]

Conversion Lists
Interwiki articles

References[]

External links[]

Navigation[]