🛠 This subject is work in progress.
Please bear with us while improvements are being made, and as soon as an editor is available to do so assume good faith until the edits are complete.
For information on how to help, see the guidelines.  More subjects categorized here.
! The following is a tutorial made for VOCALOID fans by fellow VOCALOID fans. !

Because Vocaloid is a musical tool which attempts to rebuild the speech and expressive elements of a singing voice from a library of recorded samples, the phonologic and phonetic aspect are important elements behind it. Due this sooner or later, the user can't avoid to bump with them in a given time. For that reason, the users must be aware that may require some basic phonological and phonetic knowledge to utilize the software.

This article is an simplified attempt to help to user to understand the phonological and phonetic terms and definitions that can be encountered.

About LinguisticsEdit

Linguistics is the scientific study of human language. This cognitive science has various categories or branches that focus on different aspects of the language. Among these ones, the Phonetics and Phonology are the branches that studies the language at it's most basic level: the production of the sounds that compromises it.  Although both are similar and overlaps in various areas and definitions, their focus is different.

The Phonetics particularly studies the sounds that constitutes the language, including their production by the speaker (Articulatory phonetics), their acoustic properties and transmission (Acoustic phonetics) and their reception and perception by the listener (Auditory phonetics). It's unit is phone.

The Phonology insteads studies organization, behavior of the speech sounds and their systematic organization for each the different languages. It's unit is the phoneme.

The differences between both branches become more evident when you analize their respective basic units. The phone

meanwhile the phoneme

the allophone

Phonetic AlphabetsEdit

In a attempt to classify the speech sounds, several phonetics alphabets and scripts have been created. In this section we're going to discuss the most relevant for the VOCALOID users.

The International Phonetic Alphabet (abbreviated as IPA) is a phonetic alphabet created by the International Phonetic Association in a attempt to standardize the representation of the sounds of spoken language. This alphabet uses symbols based in the Latin alphabet and is, arguably, the most utilized phonetic alphabet.

Another important phonetic alphabet is the the Extended Speech Assessment Methods Phonetic Alphabet also known as X-SAMPA. This phonetic alphabet is an extension of the SAMPA, another computer-readable phonetic script, based on the IPA, that uses ASCII characters and was created to work around the inability of text encodings to represent IPA symbols before the creation and extended use of the Unicode which, unlike the ASCII, supports the IPA's character. The X-SAMPA still is utilized for input easily the phonemes using the common characters of a keyboard, and in the case of Vocaloid, the symbols of the Phonetic System are based on this transcription.

Vocal Tract & Anatomy of SpeechEdit

Classification of the Speech SoundsEdit


The consonants are the speech sounds that are produced through a complete or partial closure of the vocal tract. Phonologically, the consonant usually constitutes the margins of the syllable: the onset (beginning of the syllable) and coda (the ending of the syllable), although there are some instances where it can act as nucleous of the syllable on its own (syllabic consonant). The consonants can be classified by its features. Usually the Phonation, the Place of Articulation and the Manner of Articulation are the features used for describe a consonant.


The phonation refers to if the consonant is produced with vibration of the vocal chords or not. When the vocal cords vibrate fully, the consonant is voiced; when they do not, the consonant is voiceless.

In general terms,  the obstruents are prototypically voiceless, though voiced obstruents are common. This contrasts with sonorants, which are prototypically voiced and only rarely voiceless.

Place of ArticulationEdit

The place of articulation of a consonant refers to the point of contact where the the obstruction or closure happens in vocal tract, and which speech organs, or articulators, are involved.

Among the places of articulation that can be found in the 5 languages available for the VOCALOID software, are:

  • Bilabial: The bilabial consonants are the speech sounds that are articulated with both lips.
  • Labiodental: The labiodental consonants are the sounds that are  produced with the lower lip against the upper teeth.
  • Dental: The dental consonants are the sounds that are articulated tongue and the upper teeth.
  • Alveolar: The alveolar consonants are sound that are articulated with the tongue against or close to the superior alveolar ridge.
  • Post-alveolar: the postalveolar consonant are the sounds that are articulated with the tongue and the back of the alveolar ridge, before the hard palate. This kinds of sounds have interesting properties, both acoustically and phonologically.
    • palato-alveolar
    • retroflex
    • alveolo-palatal
  • Palatal: with the tongue and the hard palatate.
  • Velar: with the back of the tongue (dorsum) and the soft palate, also known as velum.
  • Uvular: the back of the tongue and the uvula
  • Glottal: produced with the glottis (the opening of the vocal folds) as articulators.

Manner of ArticulationEdit

The manner of articulation refers about how the air escapes from the vocal track and the configuration and proximity of the articulators.

The consonants can be classified in two major groups: the obstruents and the sonorants.

The obstruents are the speech sounds that are produced blocking and disturbing the airflow from the vocal track. The obstruction can be total or partial, and depends on the degree of stricture or closure of the articulators.

  • Stops: The stops consonants, also known as oral occlusives or plosives, are the speech sounds produced by a complete closure or occlusion of the articulators and stop of the airflow. The name plosive refers to the plosion, the air burst produced when the closure is released.
  • Affricate: The affricates are the the speech sounds that begins as a stop (complete closure) and releases as a fricative (frication). The affricate sounds can be tricky to classify, as they need to be differenciated from stop-fricative consonant pairs.
  • Fricative: The fricatives are the speech sounds that are produced by forcing air through a constricted passage between the articulators. The reduced space disturbs the airflow, thus producing a turbulent flow and an audible friction referred as frication.
  • Sibilant: The sibilants are a kind of fricative and affricate consonants that are produced bringing the tip, or blade, of the tongue is brought near the roof of the mouth and air is pushed past the tongue, producing a hissing or hushing sound. Acoustically, the sibilants are characterized by their sharp and strident sound. In terms of writting, the sibilants usually are related to the lettters S, Z and C.

The sonorants or resonants are the speech sounds that are produced with a continuous, non-turbulent airflow in the vocal tract:

  • Nasal: The nasal stops, or simply nasals, are sounds that are produced with a complete oclussion or closure of the oral cavity and the lowering of the velum, which redirects the airflow trough the nasal cavity This is what it gives the particular acoustics and resonance properties to this kind of consonants. Orthographically this kind of sounds usually are related and represented by the letters N, M and their derivatives. Also it's important to mention this kind of sounds are prone to assimilation processes, like producing nasalization of the vowels or adquiring the place of the place of articulation of a adyacent consonant.
  • Approximant: The approximants are produced with little constriction of the vocal track. There's enough approaching of the articulators to shape the sound, but not too much that could disturb the vocal track's airflow. Among the consonants included into this groups, are included:
    • Liquids: The liquid consonants are a class of consonants consisting of the lateral consonants (L-like) and the rothic consonants (R-like).
      • Laterals: The lateral consonants are the sounds produced with the tongue blocking the oral cavity transversally, which redirects the airflow trough the sides or laterals of the obstruction, hence the name.
      • Rhotics: rhotic consonants, or "R-like" sounds, are liquid consonants that are traditionally represented orthographically by symbols derived from the Greek letter rho (Ρ ρ), and its Latin equivalent, the letter R. Phonetically there's little correlation. Instead, their similarities seems to be phonological
    • Semivowels: 

This group is important as



  • Close vowel
  • Near-close vowel
  • Close-mid vowel
  • Mid vowel
  • Open-mid vowel
  • Near-open vowel
  • Open


  • Front vowel
  • Near-front vowel
  • Central vowel
  • Near-back vowel
  • Back vowel


  • Rounded
  • Unrounded

  • The Vowel system tends to be symmetrical
  • The front vowels (left side of the vowel diagram) tend to unrounded while the back vowels (right side of the diagram) tend to be rounded

Properties of the Speech SoundsEdit



Formant 0: Fundamental Frecuenty (Pitch)

First and Second formants: vowel color and recognition. The first one is related to

Third Vowel: Rhoticity

5, 6 and onwards: Voice Timbre

Source-Filter ModelEdit

Phonological ConceptsEdit

Singing voice & Speaking voice differencesEdit

Formant's SingerEdit

Formant Tuning & Vowel ModificationEdit