Vocaloid Wiki
Vocaloid Wiki
This is a controversial topic.
Any changes should be notified to ensure that edits meet VOCALOID Wiki's policies and guidelines.  Discussions for controversial topics are here and more subjects categorized here.

A cause for controversial concern is aimed at individual vocals. Some a product of myth or bias, the Vocaloid can be misjudged by large audiences if misconception or opinion is taken too heavily as fact.


The quality of a Vocaloid is complex to determine and is affected by a number of factors, both which can be regarded and disregarded at a user's pleasure.

What is a "High Quality" result?[]

It is important to remember that VOCALOID itself was designed to be used by professionals; as a result, even the lowest quality vocal is considered by synthesizer standards to be a HQ product. For Macne Nana it was noted that she was edited to be able to meet a 24bit/96kHz quality mark, but it required editing the samples. This has been one of the few statistics given on the quality of VOCALOIDs overall. But different factors still come into play at different points and lots of small variables play a role. The engine itself for example, however, can have impact on the vocal, and not all vocalists are suitable for the engine, as Gackt was warned when he made his first recordings.[1]

The following are a few (but not all) examples of possible contributors to quality:

  • Style: For example Prima was built for opera, and when used to sing in that genre will excel over a Vocaloid not built with this style in mind. This means Prima produces, for her purpose, a high quality result. Using her for soul, rock and a number of other genres of music far from the opera style would produce a much lower quality vocal as she does not have the capabilities required for that genre. Niche vocals commonly have this same problem but they are often regarded as some of the highest quality vocals and will out class most other Vocaloids attempting the same vocal style, but they have a limited effectiveness outside of that genre. Other examples of such a vocal is Sachiko who is best suited for Enka or Yuzuki Yukari who is better for slow singing styles and has issues with faster ones.
  • Vocalist: professional vocalists often produce higher quality results than voice actors (with exceptions). For example, Gackt would be able to give samples with more confidence than an amateur singer such as Haruna Ikezawa. This is due to issues such as their training or experience singing, which affect their capability to match pitch, tone or sound.
  • Version: The original VOCALOID engine is more limited than later versions of the software. Its vocal range capabilities were smaller and it had a larger amount of digital noise. the original KAITO vocal results of lower technical quality than the KAITO V3 package. The same is true of Megpoid and V3 Megpoid - Native, wherein the "Native" V3 update is much higher quality than the original Megpoid vocal. Updates to the engine add new functions, as well as often improving existing ones. Older vocals may not have access to them, even if they are imported into the newer engine. This is not to say that a vocal does not change once imported into a newer engine; VOCALOID2 vocals imported into VOCALOID3 displayed a large quality jump, though some experienced errors they did not possess before, such as Sonika.
  • Script: prior to VOCALOID4, problems with English vocals were reported. This is owed to the fact the standard English script produced in the pre-VOCALOID, contained errors. This was the script sold with every Vocaloid Dev kit up until VOCALOID3, therefore every English vocal produced using this script was sold with certain errors built in. Both Ruby and Cyber Diva were created using different scripts, both intended to improve on the standard English script offered by Yamaha with the Dev Kit.
  • Pitch: pitch layering also adds to the quality of the product by giving a strong voice to build the vocal upon. The layers of pitching are divided into at least two known types: stationary and articulation. The number of pitches per vocal varies; Maika has six pitch layers in total, whereas other Vocaloids have only two. The way the pitches affect the performance and quality of a voicebank is related to the synthesis engine itself, and the mathematical interpolation and extrapolation processes used to reconstitute the full range of the voice. In theory, with more pitch layers, the voicebank should sound more like its voice provider, although doing this also increases the filesize and computational resources needed for the voicebank. Studios may not do this as well as it can extend development time and at times cost more to produce due to the vocalist having to be in the studio longer to record and due to the extra work load. Most English Vocaloids pre-VOCALOID4 were confirmed to have only two layers of articulation, the total layers of pitching for Yohioloid was confirmed to be 3.5. In addition to the number of layers, it is important to note that a Vocaloid will end up sounding off pitch if just one of these layers is incorrect, in the same way an incorrect data input of just one variable can deviante the whole result of a regression model. A known example is Lily's VOCALOID2 vocal, which had one of her pitch layers too high. As a result, Lily's VOCALOID2 voice was entirely off pitch. This was later corrected with V3 Lily.[2]
  • Phonetic data: in addition to what sounds are recorded via the script, which phonetics are constructed for the voicebank are also a factor. Phonetic data is recorded when the vocalist reads their respective language's script, but some advanced samples may have to be constructed by merging two or more samples together. Compared to pitch, these do not have an impact on the entirety of the voice, but do have an impact on constructed words. The construct samples are normally the "diaphonetic" and "triphonetic" samples. Prior to VOCALOID3, only diaphonetic data was included in vocals and the engine did not factor in triphonetic sounds. These were introduced in VOCALOID3 and resulted in a higher quality voicebank, because the vocal results were smoothed out. This became a major factor in why VOCALOID3 vocals had far higher quality than VOCALOID2 ones. As noted by Internet co., Ltd, however, an increased amount of them does not necessarily result in a higher quality vocal. Triphonetic data does not always require new recordings and it is possible to construct them with previous recordings.

Not a Quality issue?[]

It is important to note that certain aspects are not a result of the vocaloid's quality. Unfortunately, quality myths among the fans or users and have been known to cost a vocaloid sales. These include:

  • No matter how good the Vocaloid is, if the user doesn't know how to use the vocal well, the results will be low quality. Likewise, an extremely skilled user can make a Vocaloid sound higher quality than it is. Therefore it is not always easy to judge a Vocaloid only by being based on songs that producers have produced in the past. This has been a common mistake made by fans listening to VOCALOID music. This is especially vital for vocals mentioned in style as these most commonly produce low quality results when they are used incorrectly.
  • If the user tries to make the Vocaloid do something it wasn't built for, this reduces the quality of the vocal track. This is mentioned in Style, but also applies to things that are outside VOCALOID's domain, such as basic speech or forcing a Vocaloid to sing in a language it wasn't built for (particularly if there is a large phonetic difference between the two).
    • In addition, despite things such as optimum Tempo or range, this doesn't also stop a producer venturing out of these ranges. For example Utatane Piko is not a lower pitched male vocal, his lower ranges are considered more lower quality. Yet, there are some who like this lower result despite quality loss, even with other vocaloids like VY2 who could do the same song better. This is attempted by producers of any skill levels.
  • In addition, even though the user may enjoy the way a Vocaloid sounds, this does not always mean the Vocaloid is high in quality. Appeal falls purely down to an individual's opinion/taste, and many enjoy the robotic sound of low technical quality while others can easily fall subject to cognitive bias. The Kagamine Vocaloids are a notable example VOCALOID2 vocals with quality issues. These vary between the poor "Act1" package to the high quality "Append". Despite these problems, they remain very popular, with many fans that enjoy their quirks.
  • See also "Languages".

How important is it?[]

Regardless of all these factors, lack of technical "quality" does not always mean a Vocaloid is more useful or reliable than it's more technically advanced competitors. For example, out of the VOCALOID2 vocals Hatsune Miku and Gumi's Megpoid vocal, Gumi is actually the higher quality singer and outclasses Miku. However, Miku gained a reputation for her ease of use despite being one of the lowest quality VOCALOID2 vocals and had the ability to morph her vocal into a variety or ways to sound differently and was the most adaptable vocal prior to SF-A2 miki was released. Gumi, however, had a reputation for resisting change and was a hard vocal to get to break into other tones, with her quality being impacted when you did managed to successfully do that. To add to this she was harder to use. Thus despite Gumi being the better singer, it was more likely for users in VOCALOID2 often stuck with Miku because Miku had more capabilities to sing different genres, while Gumi struggled to produce results outside of her vocals speciality areas.

In addition, compatibility with the engine version can also render a vocal able to excel in unexpected ways. Despite its lower quality, the original Kaito product was very flexible within the VOCALOID engine, allowing it to excel despite being lower quality than KAITO V3.[3]

Quality, either way is just one of many factors to take into account and may not be the most important of all. It is very easy to fall into the trap of using "quality" as a form of Confirmation bias to label voicebanks as completely good or bad even though "good" voicebanks can have negative traits and "bad" ones positive traits. For example, members of the VY series saw much praise for their quality over all other VOCALOIDs in VOCALOID2 when the first two VY1 and VY2 appeared and this at times became a focusing point for fans despite all other traits from other Vocaloids or there still existing flaws within the two voicebanks. It is not uncommon when users talk about quality, they will pick the best songs in support or against Vocaloids to show off which Vocaloid voicebanks are better or worst against others.

While not Vocaloid, a classic example of lower quality vocals succeeding over higher is exampled by software Chipspeech. Chipspeech is considered as a HQ product but its vocals are based on dated pre-2000s technology which produces a vintage sound result, imitating lower quality sound compared to many of the modern vocal synthesizers. These vocals are often still sought after despite being LQ because of their robotic slurs, crunchy sounds and broken phonematic constructions give unique results that producers of music can use to recreate effects, atmosphere and other needed sounds, thus booming sounds and static pronunciations are easy to produce. If one compared Otto Mozer to a Vocaloid vocal like Kaito then they may feel uncomfortable listening to Otto Mozer in contrast to Kaito simply based on his deep, booming and otherwise inhuman vocal. But this may be just the sound someone wants for a menacing or unsettling song, as Otto Mozer can do this particular sound without work. However, Kaito may need serious manipulation to do the same effect. The vocals are provided as a tool, as means to an end; but not every tool can fit within the end criteria.


There is an ongoing debate regarding the realism of VOCALOID vocals. As mentioned previously in Singing vocal clones, one of the earliest controversies associated with VOCALOID was their potential to replace voice providers. Companies such as Crypton Future Media have deliberately avoided realistic recreations of their voice providers for this reason. "Realism" has become a subject of discussion among producers and fans alike.

Comparison between a VOCALOID vocal and a VOCALOID2 one from Crypton Future Media made using sound engineering tools. This shows the improvement of sound between the two engine versions overall

The main points of contention with "realism" result from subjectivity - each listener judges realism with different criteria in mind, leading to differences in opinion. However, developers and their sound engineers often have tools which allow a much more delicate comparison of vocals, so the listener is fully capable of being incorrect with their assumptions on realism because of cognitive bias, just as they are fully capable of being incorrect with every other assumption that can be made on VOCALOID. As with "Quality" when talking about the effects of realism, it is tempting to use examples to prove which voicebanks are capable of being realistic and which ones are not and show songs that prove one way or another. However, completed songs do not reflect the true realism, or lack of, in the actual voicebanks themselves.

The realism of VOCALOIDs is purely based on how well it achieves "the uncanny valley" effect. All VOCALOIDs are considered "fake" since they are the result of a machine, therefore they're not truly considered "realistic" in the sense that they were not spoken by a real human being. However, due to the uncanny valley effect, it is possible to trick the human ear into believing that something fake is, in fact, real. Therefore, if a listener says that a VOCALOID singing result is "realistic", they have been tricked by the uncanny valley effect successfully and cannot tell the difference between a realistic vocal and the VOCALOID vocal. In this case, VOCALOID achieves what it aims to achieve by providing an alternative to having an actual singer.

Recreating the Human Voice[]

This is divide between three major factors; the ability to speak words, the ability to express emotions and the ability to create natural variation per singer, all of which sit on top of the VOCALOID engines' own ability to accurately create the illusion of a real singer and the mimicry of the human vocal cords.

Word Formation[]

One of the simplest to understand acts of how VOCALOIDs mimic the formation of words correctly. Words to human beings are important to share information and can be used to tell stories or convey messages. Thus VOCALOID needs to correctly have the abilities to form words that humans can understand. VOCALOID is overall fairly successful at forming words, though some VOCALOIDs are better than others.

As mentioned in Languages, it has often been incorrectly quoted by fans that there are separate engines for English and Japanese. The idea that certain languages are less realistic is quite controversial among fans, as it has been used to dismiss large areas of VOCALOID as "less realistic"; the reasons behind this are highlighted in Languages. In reality, all languages are built upon a shared engine version of the VOCALOID engine/API, and given the right circumstances, it is just as capable of producing the same amount of realism in all languages it can recreate. Deficiencies in realism are typically caused by the assets used with the engine and the resources they use. The VOCALOID engine/VOCALOID API is not different for each language.

A by-product of less precise languages such as English, as they require the blending of sounds, leads to clarity loss. The VOCALOID API can struggle to manage these larger voicebanks and older versions of the VOCALOID engine and VOCALOID engine API are often worst, such as VOCALOID and VOCALOID2.

In the case of English, on top of this, it is almost impossible to get precise sounds as English offers much less of them. Clarity of sound can often be considered a subject of how realistic a vocal is, though this is more subjective as real singers also have variation on clarity depending on a number of factors. A example of this is found among singers such as Debbie Harry, who are known for their lack of clarity.

However, the inability to actually hear a VOCALOID can contribute to the inability to tell how realistic they are. The lack of precision of sounds can also be seen as a lack of ability to recreate the language to a degree enough for the actual speaker to understand. In turn, this impacts the idea of how realistic that VOCALOID is compared to a real singer who can speak more clearly and precisely. This is harder on languages like English as their large amount of samples make it harder to check every combination, thus they are more prone to suffer from this especially.[4][5] This contributes to why there is some controversy in regards to realism and languages among producers and fans.

A change of how a database is set up can also improve languages, as was demonstrated by VY1 and later repeated with Cyber Diva without the need to alter the VOCALOID API itself. So an entire language's ability to sound realistic can jump with a single development going forward and realism can be lost or gained in any direction depending on what type of new development has occurred. For example, until VOCALOID4, it was favoured that Japanese VOCALOIDs be recorded for ease of use for music; from VOCALOID4 onwards it was traits of the singer that were more important. From that point on, this impacted the VOCALOID's chance to more closely mimic their providers, however the cost was making the VOCALOIDs harder to use.[6]

Finally, one of the main issues with VOCALOID is that compared to a real human, VOCALOIDs cannot learn new sounds. While manipulation of sounds to create new words is possible, as users have often used voicebanks for languages they were not made for such as Japanese for English voicebanks, if a sound is not present then it will never exist in that version. VOCALOIDs simply cannot invent new sounds and often rely on updates to fix problems, such as the case for Hatsune Miku V3 English and Hatsune Miku V4 English. This also impacts existing sounds equally and a bad sample within the voicebank database can be hard to repair without knowledge of how to repair it.

Expressing Emotions[]

Emotions add impact to words and give them meaning, allowing for more accurate communication; to put it simply, laughing while saying "Look, there is a lion!" would convey a completely different meaning to shouting in panic "Look, there is a lion!". Emotions are therefore important for empathy and add layers of expression to an otherwise mundane verbal statement, and it becomes especially important in music.

Synthesizers generally struggle to express emotion, with this issue not being foreign to Vocaloid either.[7] While they can accurately match tone when it comes to note matching, what they lack is the ability to put emphasis on a word to create an emotional response. This is important to conveying an emotion or generally mood setting within communication, particularly when it comes to music making. A sad song of despair sung by a singer who sounds like they're about to cry is going to convey sorrow more than a song where the singer sounds angry or happy while singing said song.

Even now, Vocaloid struggles to express emotions and it has been a continued shortfall since its early days of the software and synthesizers in general for decades have yet to create a engine that can replace the singer. So even CeVIO, UTAU and Synthesizer V cannot achieve this either. "Queen of the Night", from Mozart's opera The Magic Flute was made in 1984 by Yves Potard and Xavier Rodet using the CHANT synthesizer. This contained no tangible words at all, yet was considered at the time the most realistic portrayal of the human voice.[8] This is largely due to the expression the synthetic vocal displayed within the song, being close to the expression of the human voice. Before this, synthesizing software didn't even come close to sounding realistic as seen with Chipspeech vocals.

To understand the short comings of VOCALOID, and in turn all synthesizers, a producer has to understand the reason they are like this simply by how they are built. The most realistic method of producing a vocal currently is via a "formanant" method, this involves using samples gained from a singing source to "build" a vocal using a synthesizer engine. The formanant method requires recording samples for use within a voice databank library, the more layers added in theory the more accurate the voice is, providing all layers are accurately record. Famously Lily had pitching issues due to one of her layers being too high pitched. It is possible to produce a voicebank with just a single layer; realism is not 100% certain though the voicebank is highly flexible.

For Vocaloid;

  • Each voicebank is based on 2 or more "stationary layers" of sound which are recorded. These are the vocals basic sounds and keys, these give the Vocaloid its main traits and sound qualities and essentially acts as the stable core of the vocal itself. Essentially it makes "Hatsune Miku" sound like "Hatsune Miku", "Oliver" sound like "Oliver" etc, etc.
  • A second set of layers are created by the transition between samples, referred to as "articulation layers" and act as the vocaloids ability to say words, and secondary express emotions as a result. In other words articulation is the impact of singing a song and going from one note to the other up and down the scales. These layers often control the way the vocal behaves, such as Yukari's expressive tone or Cul's ability to climb from lower notes to higher ones.

Vocaloid uses mathematics as a basis and creates a "model" of the vocal uses all the layers to create the final singing result of each Vocaloid's vocal. This is the final resulting vocal that the user hears sing when using the software. The influence of the layers still influences the way each voicebank behaves. A expressive bass will cause the final voice to be more loose and expressive on the lower ranges. A soft treble range will create weak but expressive high ranges. Combined together, the vocal produced from the samples will have an extremely expressive and loose sounding voicebank. Even if the treble range is solid, the bass layers will influence the upper ranges, causing looser sounds, while the solid treble range will impact the bass and make it loose some of its influence, though bass and treble range in this case will still significantly be different compared to that of a voicebank whose layers are very similar in traits. However, despite the traits changing the overall working model VOCALOID produces, is mostly monotone as a result, as VOCALOID takes all the layers to create a single voice tone overall. Some vocals are significantly impacted more then others, such as Gumi's "Native" Voicebank.

Some Vocaloids have been able to get around the expression of emotion by simply adding more voicebanks to a release. A change of expression can lead to a more realistic result, mimicking a human changing expression in a song.

  • For example, at higher notes, Gackpoid V4 can change to the "Power" vocal, making the voice bolder and more vivid sounding,
  • At lower or sadder notes can use the "Whisper" vocal to make Camui Gackpo sound meeker.

This simple act of switching from one voicebank to another gives a better result, simply because each one is a different "voice" built by the engine and has its own rules thanks to having different layers of stationary and articulation layers. It acts differently to the previously used voicebank and that alone is enough to give the final singing result a more expressive singing quality and a voice that is far less monotone. So using "Native", "Power" and "Whisper" together will give Gackpo a much more realistic result then using any of those 3 vocals alone.

The full effect even with the added voicebanks is still not considered on par with what an actual human can do to express emotion in music. Simply, they can adapt on a whim and form ideas through their own expressions on how to better express a sound or word without losing a vital empathy response from the listener. Despite this, some of the lost realism via clarity when forming words, is instead regained by expression due to increased softness and softer vocals, as seen with Yuzuki Yukari. She can have more expression, though success of this will depend on the smoothness of sample transitioning.

VOCALOID4 users also have access to XSY, which can also aid in expression. Going back to Gackpo's "Native", "Power" and "Whisper" vocals, it is possible to XSY "Native" and "Power" to create a new "Native-Power" tone. This can act as a bridge between the two vocals, allowing for a less noticable switch between voicebanks mid-song and was the intention of the XSY function. It is also possible to use a touch of "Power" here and there to slightly improve the "Native" vocal when "Power" is not fully required in the song. However, XSY results themselves are not realistic and are lower in quality, which means careful usage of them must be considered.

Despite these tools and advantages given, often users simply use the same voicebank for an entire song, and this is an option for such multiple voicebank releases, but can be a wasted chance to add expression. This is simply because they enjoy the way that voicebank sounds. For example out of the 7 voicebanks for Japanese Hatsune Miku only "Original" and "Dark" are popular and thus see great use. "Light" and "Vivid" were dropped due to their lack of use, while "Sweet", "Solid" and "Soft" are more commonly used, they still do not see extensive use compared to "Original" and "Dark". This is just because the "Original" and "Dark" are voicebanks users prefer and thus they will use them to write entire songs with.

For the time being, this impacts the VOCALOID engine considerably and is considered one of the most major flaws to the realism VOCALOID can produce. It is one of the major reasons VOCALOID is said to be unable to replace a actual need for a real singer.

The best example of this was displayed with I=Fantasy, SeeU's demo song. It became apparent very quickly that a real singer was in use because of the layers of sound and tone variation the song displayed.

Natural Variation[]

Another more subtle way of creation of realism is simply via tone variation, it is the capture of the organic traits flaws and its strength of the human vocal. This includes the vocalists success at singing adequately, as well as its failures at meeting absolute perfection, this is the singer's vocal traits that make them stand out from other singers even when they are of the same vocal type. Without natural variation, every Vocaloid voicebank would sound identical and the number of voicebanks required would be greatly reduced, but it is because of the number of variations per the individual vocalists that so many voicebanks can be produced.

To begin to speak on this, it is important to remember there are always restrictions on how much variation is and it is the reason why real vocalists and VOCALOIDs can sound like others. A typical adult male human is only capable of a range between 85 to 180 Hz, while a female a 165 to 255 Hz range. Children overall have a vocal range that varies per age until puberty, but generally falls within the 250 Hz to 300 Hz range or a higher bracket.[9] While the male voicebox is generally bigger than a female voicebox or a child's, the overall human voice itself is based on only a variation of a single design; similarities are very possible for this reason and are the reason why impressionists can do what they do.

It is the subtle differences between individuals that can separate vocalists, such as the way a certain vowel is pronounced or a tone is taken on a sound. The impact is seen with identical twins within real singers, who though will mostly overlap in vocal ranges have individual differences between their abilities and can be reflected in their performances. While they often sound more similar when they are young, they can vary greatly as they age.[10] Due to the capabilities of the human brain to learn different experiences, the general health of individuals and diet impact, even identical twins can end up with different speech patterns despite having a similar vocal range. Each has a variable amount of air the lungs can take, the throat, mouth, lips and teeth, contribute to many small differences between human voices. Even an accent changes the way an individual sounds. These are "vocal traits" and can be captured by VOCALOID during the recording process and become the "traits" of the voicebank. While two Vocaloids can sound alike, these traits cause different behaviours and the result is that two VOCALOIDs of similar vocal nature may sound alike, but will not behave the same.

Other subtle variations can happen in the way a person talks, in real life, it is almost impossible for a speaker to say "ma" the exact sample way every time, so the amount of sound variation in samples help VOCALOIDs mimic a real vocalist much closer, drawing close to "the uncanny valley" effect.

In 2010, western fans claimed that larger languages such as English or Spanish can be less realistic than smaller ones such as Japanese. However, in reality they can actually end up being more realistic than the smaller ones. The added realism is simply because of how much variation across the samples can occur, for example in English the diaphonetic sound of "ma" varies according to what came before or after it, or if it was the beginning of a word or its end. An example of this is "Mathematics" (written in IPS symbols as "maθ(ə)ˈmatɪks") demonstrating that as it contains two examples of "ma" in use; one at the beginning and one in the middle. For VOCALOID, recreating this word needs both variations or it cannot correctly sound out "Mathematics".

In smaller languages, such as in Japanese where sounds are limited in their variations entirely. Japanese voicebanks have 1/5 the amount of sounds that English ones have, so this reflects in their amount they need and have to offer. They are much more prone to repetition overall. There simply is no need for that many sample variations of "ma" to be recorded, the by-product is less variation and in turn less realism. "Ma"may remain identical at times even if it is the start, middle or end of a word, what came before it or what came after it.

With Vocaloids, there is also a factor as they also have samples for each voicebank. Each voicebanks sample set adds natural variation to a Vocaloid. While it is confirmed with LUMi samples are not always unique, and samples can be reused to fill in blanks in production, overall each Vocaloid has a unique set of samples within their Vocaloid. So while every time Lily sings C#4 she sounds the same, she should not sound identical to Big Al singing c#4 entirely. However, the fact remains if the sample is repeated several times in the same way, the result will always be the same for that particular VOCALOID as already explained.

These natural variations are important for Vocaloid to capable, as traits of a vocalist can make a vocal very distinct. Failure to capture traits, results in a less realistic vocal.

Realism as a goal?[]

Realism is also not always the sought after result and songs like Secret, Sad Machine and Appetite of a People-Pleaser are examples of songs wherein realism isn't the goal of the vocal as the VOCALOIDs don't sound like they usually do.

Realism is also not always a favoured trait of a VOCALOID for producers, as seen with Prima. Despite being one of VOCALOID2s more realistic vocals, she can be harder to use because she is locked into an opera style. So a producer may favour using a easier English vocal over her for a song despite it possibly being less realistic. This was also seen for Hatsune Miku's VOCALOID2: it was able to compete with the more realistic vocals of late VOCALOID2 such as VY1, despite one of its hold backs being it was less realistic than VY1.

Producers of past vocal synthesizer that existed in pre-2000 did not have access to more realistic vocals. Yet still made music with them such as the song "Stakker Humanoid" which took samples based on the TSI S14001A Vocal synthesizer chip. One of the reasons the Chipspeech software can still find its appeal is because of producers seeking unique sounds for music. This theory also is demonstrated with some UTAU voicebanks that record sounds such as animals over human vocals. So while VOCALOID strives to achieve realism,[11] it is not always what producers want. When speaking about Chipspeech, Plogue noted a mono-layered vocal was used for flexibility and a huge vocal range in the case of several vocals.[12] Multiple layers improve realism, but limit flexibility and range. As previously, an example with "Otto Mozer versus Kaito" in Quality VOCALOID has issues with providing certain unique and completely inorganic sounds due to its attempt to mimic realism.

Sometimes VOCALOID's attempt to be realistic can be its hindrance.

However, there are some parts of Vocaloid which can never be considered examples of true realism, simply because it is using mathematical calculations to make something work and therefore at times improvises results. For example, Cross-Synthesis by default does not produce a realistic result, simply because the vocal it produces is entirely made up and was not produced by the provider, or providers in the cross of XSY groups, naturally and it exists only within Vocaloid. Likewise VOCALOID5 introduced "Styles" and "colours" to VOCALOID, allowing for Vocaloids to be more successful at being able to be adapted for certain genres of music. But in doing so, there are also variable layers of realism achieved, especially in some extrema cases.

Voice acting/Character Portrayals[]

There are two main methods of approaching VOCALOID voicebanks, both natural and voice acted. Throughout each version of VOCALOIDs engine and languages lies a variable amount of realism across all voicebanks naturally even within the languages and among each voicebank itself. While the engine attempts to sound realistic, it is not always certain a voicebank is even trying to be realistic.

Natural Singing Vocal[]

This is a voicebank that is sold containing the traits of the vocalist and is important in particularly for profession producers or those desiring a "genuine" result. The first of this type of Vocaloid is jointly held by VOCALOIDs LEON and LOLA, who were sold as "Soul Singers" and were aimed to give a realistic 'black singer' result. The natural singing style is the most "uncanny valley" producing of the approaches to VOCALOID, being more likely to be harder to tell that the singer is not a real singer.

This "naturally realistic" result shares similar problems to some of the things mentioned in quality as a clause for concern. For example, if a layer of sound is off pitch even the slightest it impacts the quality and realism of the vocal, making a VOCALOID intention to sound like the provider fail to do so. Poor quality often leads to a dip in realism in addition. Those buying a VOCALOID of a particular singer may be disappointed if the results don't sound like the singer, but this hasn't prevent VOCALOIDs that don't always sound like their provider from selling either. While it is arguable how well each example achieves its goal, the aim of these VOCALOIDs is to sell to a producer a VOCALOID with professional or realistic results to work with for projects.

Results are mixed in every example of natural singing VOCALOIDs. Tonio is capable of sounding distinctly like his provider, leading to a notably realistic result. For comparison, the sample package "Classic vocal", also released by Zero-G, offers a chance to hear the provider in his raw singing state. However, at the same time, while the traits of the provider are clearly heard within the software, there are many technical issues with the voicebank library which can degrade the quality and impact the realism of the result. This leads to Tonio being comparable to one of the most realistic or one of the least realistic VOCALOID2 vocals, depending on the example of usage.

These vocals are some of the hardest to produce at times because there is a higher pressure to get the vocal correct and sounding just like the singer. Another issue is exampled by both Tonio and his partner Prima in that some singing styles sound "odd" when mixed with electronic vocal effects. Another issue is the pair are fixed within their style and they can have problems breaking out of it, an issue that also impacts Sachiko and a few other Vocaloids of this style. Thus, the vocals that are based on natural singing result are not always that useful.

Naturally realistic does not prevent additional vocals, as exampled by V3 Megpoid or KAITO. But it can limited the number of ways a voice can be expressed. Gumi's "Falsetto" vocal was too difficult to match against her normal voice, so could not be released due to how unbalanced it was. Instead, Kokone was produced with natural Falsetto results.

While voice actors often have no issues providing their vocals to a voicebank, professional singers can be reluctant to provider their vocals.

Voice Acting[]

The merits of voice acting versus recording the provider's natural singing is also a common subject of debate on realism. While VOCALOID attempts to recreate the human voice realistically, the studios recorded these vocalists do not always take the realistic approach. The first ever release of a voice acted VOCALOID was Hatsune Miku, whose popularity had not been hindered by her lack of realism at all. The idea of Hatsune Miku was to give producers both amateur, professional or general hobbyist a vocalist who was ready to sing with a cute sounding young female vocal. The resulting choice of direction became popular and Miku became an instant success, thus is at times not considered a drawback despite usually producing a less realistic approach. The provider does not have to be a talented singer so long as they can mimic a particular style of singing.

Voice acting results in a singer who is hired purely for their vocal performance and abilities to sound a particular way, even if this is not natural for them. For example, Kagamine Len is male, but was voiced by female voice actress Asami Shimoda, who had the ability to do both a female (Kagamine Rin) and a male vocal range (Kagamine Len). However, there can be flaws to voice acting. The problem is for any particular style of song, the results are often not a perfect match compared to using an actual voice provider who specializes in that style of singing.

Because a female vocalist was used for Len, one of the notable traits of both his Act1 and Kagamine Rin & Len V4 English voicebanks is that at times they produce results that sound notably "feminine" instead of "masculine". It was also noted that during the production of VY2, only true male vocals were referenced when producing VY2. As both Len and Ryuto were voiced by females, they could not produce satisfactory masculine results at all and were not taken into account for VY2's production. Biology plays a role in how the sexes sound in adulthood, as well as in childhood. In examples such as Kagamine Len's, wherein a vocalist represents the opposite gender, the impact is caused by the effects of the voice breaking in puberty in young boys. At times Len fails simply to match both the masculine vocals of a pre-puberty and post-puberty male.

Similarly, Otomachi Una could be said to have shortcomings in the portrayal of her characters since she was voiced by an adult, but her character is 11 years old. In comparison to Oliver or Kaai Yuki, both of whom had child vocals, Una can sometimes fail to match the softness that comes with a child vocalist.

The cause for concern in such cases is simply that when attempting to pitch realistic tuning, the results often don't produce authentic or genuine results.

Voice actors are able to give a wide variety of different tones and if a provider is required to voice multiple voicebanks, then voice acting may be the best option. While a singer can give a better singing performance, great singers do not always result in the ability to morph their voice like a trained professional voice actor can. However, the majority of voice actors have talent in singing, as they know how to change their vocals to fit different situations, as well as an understanding of microphones.[13] Regardless, the ability to morph their vocal can be useful in producing a variety of expressions for recording, as demonstrated by Hatsune Miku Append, which through voice acting allowed multiple different tones that Hatsune Miku could use alongside her normal vocals. Voice acting allows for very different approaches to one vocal, compared to natural toned extensions.

Singing roles can be quite different still to talking roles even for actors used to acting in normal speech. Otori Kohaku's provider, Asuka Kakumoto, had to work out for the first time just how "Unity-chan" sounded. So even though Otori herself was actually a voice acted performance, the singing itself was a brand new voice for the character and it required some working out how she would sound while singing. It is not certain how a voice acted performance will turn out for this reason, especially if they are not a singer. The popularity of VOCALOID for its ability to open opportunities for roles, however, has over times brought in a large interest in provider's who are voice actors and Kizuna Akari's provider, Madoka Yonezawa, has noted that software such as VOCALOID have opened doors for voice acted roles that did not exist previously.

Voice acting has proven to be a problem when it comes to multilingual providers, especially if they are not used to the language. As seen with Hatsune Miku V3 English, maintaining the same tone as the Japanese voicebank is difficult to achieve. Other issues is that the voice can sound forced or unnatural. The reason for the struggle is that with a change of language comes a change of tone, as languages do not work the same and languages like English requiring stressed sounds. This can impact a tone of a vocal, and for voice acting it can be hard to maintain a voice equal to that of the native language. If the provider is adept at languages, then they can easily produce a voicebank for any language they speak, as demonstrated by Macne Nana. Though an accent can be an issue, it tends to sound far less unnatural or forced.

Although voice acting may have shortcomings compared to natural singing, some may consider the supposed flaws to be characteristic and even defining for a VOCALOID. Otomachi Una, for example, had her voicebank provided by an adult so it can be difficult to produce natural child-like singing compared to voicebanks like Kaai Yuki who used an actual child vocal provider. However, these unnatural quirks are not necessarily a bad thing and can be considered characteristic of Otomachi Una.

Other information[]

VOCALOID still attempts to make all voicebanks realistic and a voice acted result, while not "true", is not false to the point it does not sound remotely human. As mentioned in "realism", the definition of realism is subjective and varies from person to person. One person can listen to Prima without knowing what she is nor tell she is a synthetic vocal, but another will pick up on her faults and find the voice "odd" and wonder if a tool was used such as auto-tuning software, whether or not they know what VOCALOID is. Yet Prima is classified as a "natural singing" vocal and produces authentic opera results. Another can listen to Miku and feel she is more realistic then she is, because they are not able to pick up the digital engine noise of VOCALOID, or because they don't how unrealistic she actually is. This is again thanks to the "uncanny valley" effect and how well each VOCALOID is able to trick the human ear.

In addition, some voicebanks fit into both categories because a vocalist does not have to alter their vocal in any way to voice act either and the vocalist can remain natural. In this sense, the two styles can overlap at times and a VOCALOID can be both an example of voice acting, as well as natural sounding. This is exampled by Camui Gackpo, who is regarded as a "Character Voice" role of his provider Gackt, the two are not the same singer and Gackpo simply has the traits of his provider.

Part of this is because of the idea that the "voice" you are buying is ultimately that of Camui Gackpo, not "Gackt" himself. This idea was also shared by Unity-Chan, which has two avatars Otori Kohaku and AKAZA. AZAKA's creation was because at the same time they had recorded Unity-Chan as the singings voice of Otori Kohaku, the two were so different that the provider herself noted it was as if there was two different singers entirely. Thus, it is important to be aware that not all VOCALOID and their voicebanks are unnatural just because they are voice acted and the two different approaches do have overlaps in ideas.

There are also a few examples of VOCALOIDs that use both approaches; IA (natural tone) and IA ROCKS (voice acted), Tone Rion V4 (voice acted) and Yumemi Nemu (natural tone), Galaco "Red" (natural tone) and "blue" (voice acted), Fukase "Normal" (natural tone) and "Soft" (voice acted), Yuzuki Yukari V4's "Jun" (natural tone) and her two extra vocals "Onn" and "Lin" (both are examples of voice acting).


One of the oldest controversies in Vocaloid stems from the language the Vocaloid sings and has come as a result of some of this ability for Vocaloids to be understood. This combined with reports of a significant amount of fans adoring or disliking Vocaloids based on language. This can be because they feel certain languages have more appeal, or because they are or are not native speakers. The result is that entire sections of the Vocaloid library of voicebanks have been ignored due to their language. Some of it is linked to issues expressed in VOCALOID and politics while others are independently forms reasons.

Those based on a technical level myth often display a form of cognitive bias, often basing their own conclusions of entire languages based on what they've come to conclude without confirmation of truth. I.e. they hear a result they like or don't like and they decide based on their preferences if entire sections of VOCALOID are good or bad. When they hear something that confirms this belief, they then end up being subject to forms of Confirmation bias.

As mentioned previously in "Quality", this has spawned many myths based. The story of why these myths exist is not short. The scale has been quite wide and has been witnessed by editors at the Vocaloid wiki since 2009, as well as reported by fans outside of the wiki who have also witnessed it. So some of the following comes from first hand experience of fans and editors from sites such as Vocaloid Otaku forums, YouTube, etc.

One thing to note before further information on this topic is that the phenomenon which all vocal synthisers work to that allows a listener to acknowledge a lanuage is based on the human ability to reconigse patterns and make connections is a process known as "Apophenia".[14] This can lead to trickery of the senses known as "Pareidolia", the type of which Vocaloid relies upon is "Audio Pareidolia".

English Vocaloid[]

The issue with languages began in VOCALOID2. During the VOCALOID2 era, the franchise became popular due to Hatsune Miku's fame. Vocaloid became associated with Japanese culture, and due to the overwhelming popularity of Miku, the focus shifted away from the previously English Vocaloids who had seen an overall successful run with their only failures being their impact in America. Despite having more vocals released for the English version at the time of her release, and having done well for themselves, Miku was a record breaking vocal selling an unprecedented amount of units. This was never beaten by future Vocaloids from Japan, but the next few releases benefited from her popularity. Her success also resulted in the VOCALOID vocals Meiko and Kaito being "forgotten".

By the time fans became aware of there being other Vocaloids beside Miku, Rin and Len, their popularity already dominated the community. Though Meiko and Kaito were able to find their place with the growing Japanese fanbase during 2008,[15] it was also discovered there was an entire category of Vocaloids most Japanese fans didn't know existed, namely the English vocaloids, which they dubbed "the Engloids". However, there was only a few able to use the English vocals due to the language barrier, preventing high level usage in Japan and "Engloids" became a selective interest among its fans. Even in the West, most English speaking fans were not aware that English speaking Vocaloids existed, due to Miku's overwhelming presence until roughly a year later in 2009 when Megurine Luka became the first Japanese vocal with an English voicebank. This brought English vocaloid to attention both in Japan and overseas.

When finally the English vocals began to receive interest again in 2009, there was a mixed reaction. Since Western fans had never been able to "hear" what Vocaloid's pronunciation sounded like due to the language barrier, their was some shock at English Vocaloid's unnatural nuances. Often they were branded as "low quality" when compared to the Japanese vocaloids, despite most fans not being bilingual or able to compare the two languages and therefore lacking the knowledge to know if this was true or not. The majority of the VOCALOID and VOCALOID2 English vocals were of standard or better with only two voicebanks at the time of 2009 being in a position to be called "Low Quality"; SONiKA and Megurine Luka English. For example, Sweet Ann and Prima produced high quality singing results, and were voiced by professional singers in the music industry, compared to Hatsune Miku or Kagamine Rin and Len who were voiced by voice actresses and therefore being amateur singers at best. The early Japanese VOCALOID2 vocals were often lower quality then the later VOCALOID2 that came post 2009 and vocals such as Gackpoid or Megpoid were noted for their background nuances.[16]

Not all the myths about English Vocaloids were actually untrue, as it was known that the English Vocaloid script did contain errors[17] that the Japanese Vocaloid script did not, the Japanese script had also been corrected several times as far back as Hatsune Miku's development.[8] However, English vocals were developed with an incorrect script sold by Yamaha with the English language Dev Kit that was not corrected until 2010 and was not instantly made available upon doing so in addition. English is also a far more complex language to synthesise, making it much more difficult for programmers to produce natural sounding results and the sheer size of the voicebank (being 5x the size of a Japanese one) means quality checks are often harder to make and take longer since every combination possible has to be examined.

Japanese fans of English Vocaloids expressed different opinions. For example, Tonio was praised for having a "beautiful" voice by Japanese speaking fans,[18] whereas some Westerners felt his voice was "ugly" due to its deepness. Japanese producers did not hear the same clarity issues with voices such as Big Al or Sweet Ann and felt both were fairly clear. At the same time, some Western fans felt that all English Vocaloids were unclear. Some of this was a backlash from the problems of Sonika, and fans had the habit within some communities of using her voicebank to represent all English Vocaloid voicebanks, even though it was not the standard of quality at the time. However, the prevalence of subtitles in Japanese Vocaloid songs was a result of Japanese speakers having trouble understanding the Vocaloids in their native language and is still cited as the reason many Music videos have subtitles with the lyrics of the song repeated on them. Years later when Kizuna Akari was released it was confirmed that VOCALOID produced weak consonants in general,[19] consonants often are a large contributor to the clarity of synthesizing vocals. All languages struggled with some form of clarity issues as a result of this.

In the case of Megurine Luka's VOCALOID2 voice, owed to Luka English going on sale with sounds missing and had limited words she could form,[20] there was an issues in the English voice and it was low quality even compared to some other English vocals. A myth that spawned at the time among fans was that the Japanese Luka vocal was better at producing English results than Luka English, although this was phonetically untrue. The lack of these sounds mostly impacted the precision of the language and did not limit her ability to sing in English. A lot of this had to do with the slightly sharper sounding results of the sounds of the Japanese vocals. Since Japanese has far less need for blending, it made her appear clear against her smoother, softer English vocal with its missing sounds, even though it encountered problems with using the Japanese voicebank for English and the sounds produced by the Japanese vocals were neither as smoother nor as precise as many considered them to be.

This is not isolated to just English Vocaloids and has since been expanded upon to the other non-Japanese vocaloids. However, the English version was most criticized amongst English fans. The English language itself also has a reputation for being one of the most difficult to recreate in synthesizers due to its lack of distinction per sound, resulting in a great deal of variation and complexity.[21] For instance, it is possible to find a voice unintelligible for its accent alone if the listener is not used to it. A number of factors will be at play for how successful a Vocaloid achieves this effect.

The Addition of new languages in VOCALOID3[]

As other Vocaloid languages entered the market with VOCALOID3, users began to be able to hear some of the problems repeated in other languages that plagued Japanese and English, as simple factors such as the VOCALOID engine noise were heard across all versions. The attitude improved to non-Japanese vocaloids thanks to the introduction of 3 new languages. The new languages in turn contributed in addition to the attitude to English vocals, Vocaloids like Oliver and Avanna gained popularity and many Japanese vocals like Gumi and Miku were given English voicebanks. By VOCALOID4, the overall approach many fans had toward non-Japanese Vocaloids which much more positive overall. However, there is still a great reluctance towards Vocaloids in non-Japanese languages.

A large majority of Western fans who are interested in vocaloid are also fans of manga and anime, so the Japanese vocaloids often appeal more to them. As a result, any Vocaloid not from Japan risks gets being subjected to bias regardless of what language they sing by the fans who came into Vocaloid from anime and manga. Other languages have been forced to adopt more anime-esque designs to appeal to the fans, such as Zero-G and PowerFX vocaloids, or even Bruno and Clara due to their initial reactions. However, this can be counterproductive as the most common consumers of Vocaloid sales are producers such as EDM musicians, to which such designs aren't necessarily appealing.

Bias towards languages is also brought up in regards to Spanish vocals. Since you can get a "decent" level of Spanish from Japanese vocals in contrast to making them sing English, some Spanish fans did not see the point of Spanish vocals being produced. In contrast to the structural gap between English and Japanese, using Japanese vocals for Spanish is much easier, due to similarities in basic word construction. However, using a Spanish voicebank for Spanish produces better results than using a Japanese one.

The reactions are not always straight forward and can be political based (see again VOCALOID and politics). Of the bias towards Luo Tianyi is that because she uses Japanese technology instead of Chinese, some reported feel she is not supportive of China, so despite her Chinese origins, she received criticism because of it.[22] SeeU was also a victim partly of politics, this time between Korea and Japan. Despite this, it was later reported in December 2018 by VOCALOID LINK that the Chinese vocaloids were as popular overall as the Japanese ones, though not all had seen instant success.[23]

One of the issues with languages is that there can a tendency to treat or think of Vocaloid language development as though they are a separate engine to each other. At times you will see references to the to the "Spanish Vocaloid version" of the engine for example. This gives the impression at times that each language has its own separate engine. In reality, they all used a shared engine, both voicebanks and interface are separate aspects of Vocaloid that were built on top of that engine. The engine looks up each individual Vocaloid with their own details and then pulls data from their plug-in details.

There is also the note that there is no real unified version of each language and while all Vocaloids are created from a Yamaha Dev Kit, studios tweak things to their own needs. Even within VOCALOID2, there was know to be several versions of the Japanese script; the original script, Hatsune Miku's, Internet co's and the VY series. So in this case there was at least 4 versions of "Japanese" in VOCALOID2. The Kagamine Rin & Len act2 and VY1 vocals also saw their "Vocaloid library database plug-in" (shortened to "voicebank") overhauled for improvements adding further to this. Each change made impacts the Vocaloid and its ability to recreate its intended language in a different way and some are better or worst changes for different reasons. These changes are not always shared among Vocaloid studios and each studio has its own workings that it uses for creating a language. For this reasons at times voicebanks in reality are not a unified development entirely and only major overall developments such as the addition of triphones in VOCALOID3 see any universal change within a language. So there can be dozens of variants on a language for this reason, yet habit leads all to be grouped together regardless of their differences.

The engine can be changed for this reason and impact all vocaloid. To create a different engine for each language would not be cost effective and Vocaloid is designed so its various elements can be replaced, such as the user interface, without this having greater implication on the overall engine.

Sound capabilities[]

Studios are not without fault and sometimes have made claims about their products, which often are true, are not always as good as they seem.

One VOCALOID who was a subject to controversy claims was SeeU. In the early VOCALOID3 demos, SBS Artech claimed you could use SeeU's voicebank to create English, even though it was set up for Korean and had no support for using it to make English. They even went so far as to label her a "trilingual" VOCALOID, even though with only two voicebanks she was bilingual. Upon inspection of the voicebank, some producers discovered that SeeU had several English phonemes that were not needed for Korean and SBS were again heavily criticised for both the claim inclusion of phonetics SeeU didn't need. Inclusion of unneeded phonetics is nothing new as every English VOCALOID2 after Prima's release (with the exception of Luka) was given the rolling "r" phonetic data "r" because Prima couldn't do opera without it. Despite this, this led to jokes about SeeU being made for "Konglish" rather than "Korean" and as of note, criticism for the focus on her English capabilities against the Japanese capabilities she was given thanks to an additional Japanese voicebank.

Eventually SBS Artech addressed the language issue by confirming that they would make an English voicebank and stated the reason for the past claims was they wanted to release her with an English voicebank but also wanted to meet the VOCALOID3 release. In the end they did not have time to make a English voicebank and included the phonetic data to allow SeeU to create English as most sounds for English were already in the Korean language. Despite this, as many pointed out, the way the language is structured and the fact she is largely not smooth results in English that is either choppy/broken or odd sounding thanks to a Korean accent. They then confirmed an English voicebank that will allow her to fully do English.[24]

SeeU was not the only vocaloid to be criticized for this. SONiKA was also noted for having a remark about with editing how she can be made to sing other languages. This is true for any Vocaloid vocal, though she was criticized for having it on her product page when no other Vocaloid had mentioned this at all.

Due to a custom dictionary, MAIKA was released with the 16 sounds missing from Spanish that would allow her to sing in Catalan. However, these allowed her to also sing more closely to other languages such as English, Portuguese and Japanese. However, she will not sound like a native speaker.[25]

There has been some noticeable concerns with the information on Megurine Luka ad the Kagamines from the Mikunopolis website which was written based on their VOCALOID2 releases;

  • On the Mikunopolis website Luka's profile contains the statement; "In the past, creating songs with English lyrics had always been somewhat awkward, but Luka is able to sing both in Japanese, English or a mix of the two - a worldwide virtual singer". While it is true it had always been awkward to use a Japanese VOCALOID for English, there had always existed native English capable VOCALOIDs so the English language had always been easy to access. The noted issue was the websites failure was in its wording, and it gave the impression that Luka was the first Vocaloid to sing in English.
    • As mentioned on Luka's product page, she has a number of issues with her English voicebank that leave some English speakers questioning her English capabilities overall. So ironically, while she had the best English level of any Japanese based VOCALOID at the time of her release, just how much lack of awkwardness there is to her vocal results is questionable.
    • Luka was aimed at the Japanese speaking market and was originally not intended to be a world wide release (hence the lack of an English interface). So this gave the impression of a Vocaloid easily accessible to the rest of the world when at the time she was limited in her availability outside of Japan.[26]
  • "The Kagamine twins are well known for their very clear and precise vocals". Considering their history of lack of clarity and/or pronunciation problems, as well as Act 2 of the software missing a pronunciation entirely, this is a little bit of an exaggeration on the webhost's part. Also the statement, "Just with a few tweaks here and there they could sing almost any other genres as well," is somewhat questionable due to their reputation of requiring previous experience to use and often at times needing more than "just a few tweaks" to make them work.[26]

Demo songs[]

Both GUMI and MAYU had a song where they sung in "English" using their voicebanks ("Fly Me to the Moon" and "Dreamin'", respectively). Except, both were using a Japanese voicebank to sing in English. The original "Fly Me to the Moon" demonstration was more welcomed with Megpoid than the second version done for her V3 update. Back then,this gathered interest caused by her more accurate English pronunciations than past Japanese VOCALOIDs, as a common western practice was to use Japanese VOCALOIDs for English. This was because because of the peferred vocals in Japanese over what the English vocals offered. However, by the second time the demo appeared it was questioned why Internet Co. was persisting with the same demo song when it's also possible to do a version with Japanese lyrics. For MAYU's demonstration, English fans have also complained that they can barely understand MAYU's "English".

A similar prospect can also be said for English VOCALOIDs who have been used for demonstrations in other languages, such as what was seen in demos for Prima and Tonio.

There is a positive argument to be made that showing a demo with the VOCALOID being forced to sing in a language they were not designed for is a bad demonstration of the VOCALOIDs capabilities, since it can generate a wrong impression of the real strengths and weakness of the voicebank, either making a voicebank appear more flexible than it is in reality. It can end up showing some flaws that aren't present in its original language as the voice is pushed away from its natural "safe zone". This wrong impression can be worse if the listener isn't a native speaker, or at least someone with a deep knowledge of that language.

This is particularly notable as the producer of the song must be adapt with phoneme adaptations enough to manipulate the vocal to mimic another language. Demo makers such as Giuseppe have been known to be able to make Vocaloids sing in several languages they were not built for.

Other issues fall upon the tuning of the vocal. In regards to demos created by producers such as Cillia, the way they tune their vocals results in an unnatural portrayal of the vocal. It is not always possible to hear the VOCALOID's true traits for this reason. Other producers have a habit of using the wrong pitch as was heard in a few of Lily's V2 demos because producers are accustomed to using other vocals like Hatsune Miku.

Petitions and their impact[]

galaco was first referred to as a VOCALOID when the VOCALOID Shop's competition for VOCALOID3 voicebanks was launched. Anyone who met the requirements to "win" her was issued an expiring code when the competition ended. The activation code for galaco originally expired on January 31, 2013. Throughout October 2013, re-issues of her codes were made; however, it was impossible for those behind her to release a non-expiring code. The vocal finally expired on October 31, 2013 and galaco NEO was released in August 2014 to replace the vocal. Throughout the ordeal, the team behind her were working to release an official version.[27]

Panicking overseas fans started a petition during the events to prevent "the deletion of Galaco". By the end of the petition's lifespan, it had gained over 9,000 signatures.[28] When galaco NEO released, some of those who took part of the petition celebrated their efforts to "save galaco" even though the petition had no impact on its development at all. Furthermore, the petition was mostly signed by overseas fans and not by the Japanese fans, who were the target market.

So far in the entire history of VOCALOID, no petition had ever successfully impacted VOCALOID development or led to one being developed as a result of a petition. In cases such as galaco, the petition gathered mostly panic-stricken fans who didn't know much about the situation and signed because they feared galaco was to be permanently deactivated and were unaware of a replacement in the works. However, petitions don't always get any attention at all. When LEON, LOLA, and MIRIAM were being retired, fans created petitions for them to be updated, however, these petitions struggled to get even 150 signatures in comparison to galaco's. The reasons for this is the result of bias in favour of Japanese VOCALOIDs over English ones as mentioned in the languages section.[29]


  1. http://www.ssw.co.jp/products/vocal/gackpoid/infomation/episode1.html
  2. https://twitter.com/noboru1963/status/175960562545602560
  3. https://twitter.com/vocaloid_cv_cfm/status/17812817511976960
  4. Comments on Megpoid English and the 5x size making production much harder
  5. Wat commenting on the difficulty with making English voicebanks
  6. link
  7. a couple of mentions were related to this here
  8. 8.0 8.1 Red Bull Music Academy - THE MAKING OF VOCALOID Cite error: Invalid <ref> tag; name "Elvis" defined multiple times with different content
  9. https://www.axiomaudio.com/blog/audio-oddities-frequency-ranges-of-male-female-and-childrens-voices/
  10. https://www.quora.com/Do-identical-twins-have-identical-voices
  11. http://vocaloid-fanclub.deviantart.com/journal/AH-Software-Stream-details-VOCALOID5-Plans-516495989
  12. [1]
  13. https://www.quora.com/Can-a-singer-be-a-great-voice-actor
  14. https://hearinglosshelp.com/blog/apophenia-audio-pareidolia-and-musical-ear-syndrome/
  15. http://vocaloid.blog120.fc2.com/blog-entry-15045.html
  16. Taken from notes Nico Nico Pedia
  17. "Developer's Interview" halfway down the page
  18. link
  19. link
  20. here
  21. http://vocaloid.blog120.fc2.com/blog-entry-16739.html
  22. http://www.globaltimes.cn/content/1079180.shtml
  23. VOCALOID LINK website
  24. http://sbsat.co.kr/event_2012/sub_seeu.asp?m=s&bs_code=event_03&vmode=view&page=&b_idx=238&keyword_option=&keyword=&
  25. https://pbs.twimg.com/media/BZ6E76HCIAAlplK.jpg:large
  26. 26.0 26.1 http://mikunopolis.com/post/en/28/Character+Profiles.html
  27. http://www.vocaloidism.com/yamaha-panel-at-niconico-chokaigi-2/
  28. https://www.change.org/p/don-t-kill-galaco-stop-the-galaco-s-desactivation
  29. https://www.change.org/p/zero-g-limited-bring-back-leon-and-make-a-v3