User:Adept-eX/Sandbox

<!--

=Phonology and Phonetics for Vocaloid users= Because Vocaloid is a musical tool which attempts to rebuild the speech and expressive elements of a singing voice from a set of recorded diphones, the phonologic and phonetic aspect are important elements behind it, due this sooner or later, the user can't avoid to bump with them in a given time. For that reason, the users must be aware that may require some basic phonological and phonetic knowledge to utilize the software.

This article is an simplified attempt to help to user to understand the phonological and phonetic terms and definitions that can be encountered.

Obstruent
Obstruents is a kind of consonant sound produced by obtruction total or partial of the airflow in the vocal track.

The obstruent can be voiced or unvoiced, this mean they can produced with vibration of the vocal chords or not, respectively. The IPA's chart usually groups the consonant in voiceless-voiced pairs. The voiced obstruents due the vibration of the vocal chords, have a characteristic buzzing when are compared agains their voiceless counterparts (Examples). In the coloquial speech isn't weird to change the voiceless consonants by their voiced counterparts when these are in a voiced context (example: intervowel) or for change the emphasis of a the speech, even more in some languages like Korean or Chinese, there isn't a clear distintion of the voiced-voiceless pairs, being considered allophones and their phonation determined just by the context.

This groups of consonants includes the plosives (also known as stops or occlusives due the total stop of the airflow), the affricates (with a combinated articulation of the plosives and fricatives) and the fricatives (characterized by the partial obstruction of the airflow). Inside the affricates and fricatives it's possible find the groups of the sibilants consonants.

It's possible group the obstruents by its voicing and place of articulation.

Sonorant
A Sonorants is the speech sounds produced without turbulence or obstruction of the airflow. The group is diverse including to the vowel, semivowels, approximants, liquids (rhotics and laterals) and nasals. Although the definitons varies per author or source, they share a series of traits as could act as syllable nucleous or be modally voiced (rarely are unvoiced).

A feature in Vocaloid3 is the addition of devoiced variants of the sonorants. Those ones are characterized by the addition of the suffix _0 the phoneme, which correspond to the X-SAMPA representation for the voiceless diacritic. Due the sonorants group is diverse, those ones are different for each language available for Vocaloid.

Nasals
A nasal consonant is a consonant where the airflow is directed through the nose. The term is generally used for refer to the nasal stops, the most common kind of nasal consonant and the only one found in the different languages available for Vocaloid.

The nasal stops are know for its strong tendency to assimilation processes. They're known for assimilates the place of articulation of the following consonant, due this is quite common found various allophones for the nasal consonants in the most of the languages. Similarly they can cause assimalation of the preceding vowels inducing the nasalization of those ones.

Approximants
Approximants are speech sounds that involve the articulators approaching each other but not narrowly enough or with enough articulatory precision to create turbulent airflow. Therefore, approximants fall between fricatives, which do produce a turbulent airstream, and vowels, which produce no turbulence. This class of sounds is varied and includes lateral approximants (L-related, see liquid consonants section further ahead), non-lateral approximants, and the semivowels or glides.

To distingish the case of the different sonorous quality between the velar approximant of the Korean and one of the Spanish, the article will utilize the classification proposed by Eugenio Martínez-Celdrán.

Semivowel Approximants
The semivowels or glides are a kind of consonants that has phonetic behavior of a vowel, but acts as syllable boundary rather than as the nucleus of a syllable. In simple terms those ones sounds as a vowel but behaves as consonant.

Some linguists prefers call them semi-consonants to difference them of the non-syllabic vowels (which also are semivowels and are important elements of the diphthong), while other linguist consider both as the same. The distintion isn't clear and are subject mainly to the gramatical rules of each language.

Deepen further the relation between the vowels and the semivowels, occurs that each semivowel has its respective vowel counterpart. Both having practically the same sound and where the first one can be considered as the non-syllabalic counterpart of the last one.

Spirant Approximants
The spirants approximant is. Like any sonorant, they don't produce obstruction or turbulence of the airflow, being similar to the vowels. However in terms of sound, they are closer to the fricatives.

Like the semivowels, each spirant approximant can be related to a respective fricative. In some language the stability of some fricatives is low, and shifts to other articulations.

Liquid Consonants
The liquids are a kind of consonants which groups the lateral and rhotics consonants. Both kind of consonants shares a series of characteristics like: they often have the greatest freedom in occurring in consonant clusters, and they can be prolonged (or shortened) in the same manner as a vowel, and even having the possibility of act as syllable nucleus like the nasals. Their name comes from often be referred to have a "fluid" sound.

In the European Languages usually the are 2 liquid consonts, one lateral (usually related to the L) and one rhotic (usually related to the R), while in general the Asian countries only have one liquid with little distinction between the laterals and the rothics.

Laterals
A lateral consonant is an el-like consonant, in which airstream proceeds along the sides of the tongue, but is blocked by the tongue from going through the middle of the mouth. Associated to the letter L, the laterals included to taps, approximants, fricatives, affricates and clicks, the two first are the most common in the vocaloid phonetic system.

Rhotics
The rhotics, tremulants or R-like sounds, are a group of liquid consonants, they're associated to the letter R and the greek symbol rho (hence the name). The rhotics has little association phonetically talking (the kind of consonants is diverse, with little articulatory relation between them). Instead the rothics seems to have similar phonological funtions and share some phonological features (like the lowered third formant) across the different languages.

Beside the rhotic consonants is possible found rothic vowels. These vowels are characterized for have certain R-like tone (produced by the low frecuency in their third formant) and are represented as diphones in the Vocaloid's English Phonetic System. It's important stand out the R-colored vowels may differ stongly between the different voicebanks, being bind to the differences between the rhotic and non-rhotic accents.

Vowels
The vowels are the sounds that, phonetically talking, are pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis and, phonologically talking are the nucleus or peak of syllables, whereas consonants form the onset and (in languages that have them) the coda.

The vowels quality is determined by 3 articulatory features which determinate their sound, differenciating between them. Those ones are
 * height (vertical dimension) : Vowel height is named for the vertical position of the tongue relative to either the roof of the mouth or the aperture of the jaw. In close vowels (or high vowels), tongue is positioned and /or the jaw is more closed high in the mouth, whereas in open vowels (or low vowels) the tongue is positioned low in the mouth and/or the jaw is more open. The IPA recognizes 7 degrees of height for vowels (4 cardinal degrees and 3 intermediate).
 * ​Example: [u] and [i] are close (high) vowels whereas [a] is an open (low) vowel.
 * backness (horizontal dimension) : Vowel backness is named for the position of the tongue during the articulation of a vowel relative to the back of the mouth. In front vowels the tongue is positioned forward in the mouth, whereas in back vowels towards the back of the mouth. The IPA recognizes 5 degrees of backness for the vowels (3 cardinals and 2 intermediate).
 * Example: The vowel [u] is a back vowel, while the vowel [i] is a front vowel.
 * roundedness (lip position) : Roundedness shape of the lips when a vowel is pronounced, being rounded or not. It can identified 3 kinds of labialization or shapes the lips can take. Those ones are rounded (exolabial, the lips are protuding and it's possible see the inner of them), compressed (endolabial, rounded too, but instead the lips are inwards), and spread (unrounded, the lips are relaxed without take any rounding). The most common articulation for the vowels are the rounded and unrounded one. The compressed articulation is less common, appearing in few languages, being the Japanese one of them.

Pure Vowels
Also known as monophthong or stable vowels, are vowel sounds whose their quality doesn't change over time, staying stable as one vowel in the syllable.

Semivowels
Basically non-syllabic vowels. They're important elements of the diphthong. In Vocaloid, they generaly are part of the Dipthongs and Diaphonemes, however the Spanish Phonetic System represent them as separated phonemes instead and used mainly for form falling diphthongs.

Diphthong and Diaphonemes
The dipthongs are vowel sound that glides from one quality to another. The diphthong and diaphonemes usually are ambiguous vowels sequence, where the distinction is usually determined by the own lexical and linguisical rules of each language.

Phonetically talking, the diphthong joins two vowels in a single unit through. To form it is required a vowel and a semivowel (it can be an approximant consonant or a a non-syllabic vowel), the first with major prominence, being the syllable nucleus, whereas the other one has minor prominence, acting a border of the nucleus. Depending of the order of the pair, the dipthong can be classified as:


 * Rising Diphthongs : are the ones who shifts from the minor prominence to a higher one (the semivowel comes first and is followed by the vowel).
 * Falling Diphthongs : are the ones who shifts from the higher prominence to the minor one (the vowel comes first and is followed by the semivowel).

In Vocaloid, the languages that features the use of diphthongs and diaphonemes are the English and the Korean. In the case of the English the most are falling diphthongs, whereas the Korean the correspond to rising diphthongs.

Place of Articulation
The place of articulation (also point of articulation) of a consonant is the point of contact where an obstruction occurs in the vocal tract between an articulatory gesture. This one involves an active articulator and a passive location (typically some part of the tongue and some part of the roof of the mouth, respectively).

Palatals
Palatal consonants are consonants articulated with the body of the tongue raised against the hard palate (the middle part of the roof of the mouth). Consonants with the tip of the tongue curled back against the palate are called retroflex..

The palatals are usually related to the vowel [i], its glide [j] and, in minor degree, with other front close vowels as [e] and [ɪ]

Palatalization
In linguistics, palatalization may refer to two different processes by which a sound, usually a consonant, comes to be produced with the tongue in a position in the mouth near the palate.

As phonetic term, it refers to a secondary articulation of consonants by which the body of the tongue is raised toward the hard palate and the alveolar ridge during the articulation of the consonant. Such consonants is said that it's phonetically palatalized and in the International Phonetic Alphabet they are indicated by a superscript , as with [tʲ] for a palatalized [t]. The palatalized consonant is pronounced as if followed very closely by a sound [j] which correspond to the secondary articulation.

The second definition refers to a common assimilatory process or the result of such a process, which usually involves front close vowels (like [i] and [e]), the [j] or another palatal consonant causing nearby phones to shift towards (though not necessarily coming to) the palatal articulatory position or to positions closer to the front of the mouth. This makes shift the consonant into a palatalized consonant or, if the patalizaton was too drastic, change the primary articulation completely turning the consonant into a palatal consonant.

Due the patalization is an important process in Japanese and the Korean,

Labials
A group of consonants produced with one or both lips. Actually there are two kinds of labials consonants, the bilabials and labiodentals. They can be related with the letters B, P (bilabials) and V, F (labiodentals).

If we should clasiffy them define how they sound we could identify two major groups, the B-like ones and F-like ones.

Labialization
The labialization is a kind of secondary which involves lips movement (generally rounding) while the consonant is pronounced.

Also may refer to a assimilation process where the consonant adquires labial articulation. This quite common with the velars by influence of a following close back vowel or a labial consonant (becoming labialized velars and thus, adquiring a w-like sound) and the nasals followed by a bilabial consonat (generally adquiring a more m-like sound).

Dental
A dental consonant is a consonant articulated with the tongue against the upper teeth. Although this is the former definition, actually the most of the dental consonant are actually denti-alveolar consonants. This means they are articulated with a flat tongue against the alveolar ridge and upper teeth.

In the most of the languages, the denti-alveolar consonants can be related to the letters T and D in terms of writting and sound.

Alveolar
The alveolar consonants are the consonants that are articulated with the tongue against or close to the superior alveolar ridge (this one has this name due it contains the alveoli, the sockets of the superior teeth).

This group is diverse and includes to the hissing sibilants, and also various liquid consonants (both rothic and lateral).

The alveolar plosives (voiced and devoiced) actually are similar to they respective denti-alveolar consonant, thus being T-D related. In the case of the other ones depends to the group they belong (hissing sibilant, laterals and rhotics).

Post-Alveolar
The post-alveolar consonants are the ones that are articulated with the tongue touching or near the back of the alveolar ridge.

The group includes to the practically to all the hushing sibilant (sh-like) and in certain degree overlaps or is related to the palatals, having varying degree of palatization, and having the capacitie of induce palatization of adyacent phonemes as assimilatory process. Inside the post-alveolars sibilants is possible find to the retroflex (non-palatalized), palato-alveolar (slight palatalized) and the alveolo-palatal sibilants (strongly palatalized, basically palatalized post-alveolars).

Among the non-sibilant post-alveolar consonants you can find the group of the retroflex consonant. Occasionally also are included in the palatalized alveolar consonants as [nʲ], [lʲ] and [tʲ]. However generally this isn't case, as these phonemes usually are considered as variants of the palatals [ɲ], [ʎ], [c] respectively (acoustically there isn't much difference them)

Velars
Velars are consonants articulated with the tongue's dorsum against the soft palate, the back part of the roof of the mouth, also known as velum.

The velars are characterized for possess a G-like or K-like sound and in the most of the languages are associated to that letters.

They may be labialized, this means they can pronounced with lips rounding. If this is the case they adquire a w-like tone (this is often triggered by close back vowels and labial consonants).

Labiovelars and labiovelarization
The Labiovelars are a kind of double articulated that. In a similar fashion to the palatals, the labiovelars are associated to the close back vowels (as [ʊ] or [o]), mainly to the [u] and its glide [w].

In the most of the languages the [w] is a labialized velar approximant, articulated with rounding of the lips. However in some languages like the Japanese, the [w] actually is a true labiovelar approximant, being articulated with compression of the lips. Although both are similar, having a oo-like tone, the lips compression gives to the Japanese a certain b-like tone.

Like the palatalization, the labiovelarization can refer to a secondary articulation of the consonants which involves a simultaneous velarization and lips movement, giving it a w-like tone to the modified consonant. Also can refer to an assimilation process, triggered by labial and labiovelars consonants.

As was mentioned previously, in the case of the velar consonants, the process is just a simple labialization due they already possess the velar articulation.

Problems in the classification of approximants Eugenio Martínez-Celdrán

Ehat is phonological Symmetry

What is phonological universal

Phoneme and Allophone, Introduction

Phonological

Generally the l


 * Vowel System tends to be symmetrical.
 * Back vowels tends to be rounded, front vowels tends to be unrounded

As consequence of this the five-vowels system is the most common one in the world's languages, having generally a front close unrounded vowel ([i] or similar), a open vowel ([a] or similar), a back close vowel rounded ([u] or similar), and two intermediate vowels to these three ones. Generally are a mid front vowel unrounded like [e], placed between the [i]-like and the [a]-like vowel; and a mid back front vowel like [o] between the [a]-like and the [u]-like vowel.
 * The minimal vowel system includes at least 3 vowels: /i a u/ (this means a front close vowel, a mid open vowel and a back close vowel). It's said all the known languages have at least these 3 vowels or similar variations of them.

In the case of the Consonants.

=Tips and tricks for Vocaloid=

Short Notes
Using short notes can lead to interesting results. When a note is too short the sound samples are compresend, cuasing these ones will tend to blend, which allows interesting possibilities.

It's important stand out due these tricks uses short notes, are affected strongly by the Tempo. In case if the actual tempo doesn't alllow a delicate tuning, the user can use Voctro-Lab's free job plugin DOubleDur which doubles the tempo (the music plays at double rate) and note lenght at same time, which essencially doesn't affect the note's duration but allows work better with the short notes.

Also it's important remark the user could need adjust not only the note lenght, but also the accent and Velocity (VEL) to achieve a satisfactory effect.

These tricks also can be aided and combined with other of the techniques shown in this tutorial (like the use of glides/approximant or the use of the devoiced sonorants) for improve further more the desire effect.

Correcting vowel the rendering
Occasionally the rendiring of a vowel after some consonants awkard, leading to a weird pronunciation or stress of the vowel. This can be fixed splitting the note, doing a short note with the consonant and the vowel and then followed by the same vowel

[k a] sound awkard

[k a][a] ^ ^      the second vowel will be rendered correctly

For the languages with a more complex vowel system (like the English and Korean) is possible combine this with the "similar

vowels blending" for produce a closer pronunciation to an intended vowel or to change the stress or emphasis.

Blend phonemes
As was mentioned early, when a note is too short, the sounds samples of each phoneme are compressed which leads to the possibility of blend them, allowing to imitate some phonological process as co-articulation and assimilation if one manipules the phonemes correctly.

Trill
A Trill basically is a succesion of various flaps that lead to vibratory quality characteristic of these kind of speech sounds (for this reason they're also known as tremulants). Knowing this is easy to infere that is possible combine various flaps (or similar phonemes) into a trill

There are various options for achieve the effect. The among possibilities are: - do various short sillables/notes in sucession. rrrra >[4 a][4 a][4 a][a] -trill-

- put all the flap sillables into a single note and then compress the note. rrrra > [4 a 4 a 4 a][a] trill

It's possible use any vowel for the trill, although usually the best it's use the same (or a similar vowel) to the one that follows to the trill.

Coarticulation and Assimilation
It's possible generate coarticulated consonants with some

more specifically consonants with secondary articulation. A secondary articulation is made by superimposing a glide or an apprximant-like articulation on top of another constriction elsewhere in the vocal tract.

If the user is aware than the palatal consonants and the palatalization process are related to the vowel [i] and its glide [w] and the (labio)velarizations process are related to the vowel [u] and its glide [w] respectively (Revise the Phonologycal Tips) is easy to infere that putting the respective vowel after the consonant and shortening

Examples: Imitating a voiceless alveolopalatal fricative with an English voicebank. SH in English is [S] a voiceless palatoalveolar fricative, in the Japanese and Korean the SH is a [S\] or [S_j] a voiceless alveolopalatal fricative, which basically is a (more) palatalized [S].

Knowing this is possible infere that a good approximation will be use add a short [i:]

[S i:][a]-[w ]

Imitating a Ñ or palatal nasal from the Spanish

[]

The secondary articulation is listened as a soft/shot glide that follow the consonant

Knowing this along the fact the is possible

Fix awkard phoneme transitions
Occasionally when the user puts a phonetic combination that is allowed for the Editor, but it's uncommon or doesn't exist in the language used at all, it may notice the sample can sound awkard due doesn't exist a satisfactory phoneme combination. As the samples are compressed when the note is the short enough, it's possible add a more natural phoneme between the intended combination that could act as transition.

Examples: The palatalized version of the Velar Nasal [N'] doesn't stands well to be used with a vowel different to [i]. Condidering this is possible add a [i]

[N' a] > [N' i][a]

Some japanese voicebanks presents an awkard pronunciation or some sound artifacts for the combination [s i]. This probably is caused because the most of the Japanese speakers usually doesn't pronounces this combination like that because they palatalize the S (due influence of the [i]) turning it into a SH.

Knowing the closest vowel to [i] is [e], its possible use this vowel to connect the [s] with the [i] while attempts to produce a smoother and more natural pronunciation.

[s i] > [s e][i]


 * Note, in the case of the Japanese voicebanks, this tricks only works if the vowel~vowel transition is the smooth enough. The newer voicebanks doesn't suffer to much of this, however more outdated voicebanks can shown some issues.

Consonant Clusters
The short notes can be used for work the consonant cluster, being specially useful for the Stop-Liquid clusters. This can be used for achieve the clusters in languages that doesn't have them naturally (like the Japanese and the Korean) or for improve the pronunciation of the languages that already supports them (like the English and the Spanish).

For this the user must divide the cluster into two syllables, putting an ephentetic vowel between the stop and the liquid (the best it's use the same vowel that follows to the cluster), and then do the first syllable the shortest possible. The idea is

[][]

The results works even better if the user use a devoiced vowel as ephentetic vowel. This even allows to use the method for more complex clusters, like the English's silibant-stop cluster, as the vowel is practically mute.

-->

=Character Templates=