Vocaloid Wiki
! The following is a tutorial made for VOCALOID fans by fellow VOCALOID fans. !

Cross Synthesis (クロス-シーサセス), often shortened in reference to "XSY", is a parameter that allows a voicebank to blend into another voicebank in a gradual and progressive way. It is only available with the full version of the software.

VOCALOID5 was released without XSY support and it is not likely to show up in this version at all.[1]

How to use[]

First, the users must access the Cross-Synthesis Web browser through the Singer Editor, and can assign a Primary Voice and a Secondary Voice to be used in Cross-Synthesis. This feature works with a selection of VOCALOID3 voicebanks when imported into the VOCALOID4 engine, however not all products are compatible.[2] Most VOCALOIDs are not able to be registered for XSY and this is particularly true for single voicebanks that are not part of any XSY group, so releases such as Yanhe, Yohioloid and Flower cannot use this feature. Therefore, if users wish to make use of XSY, some research and planning when making purchases may be required to work out the best voicebanks to invest in.

A chart of the voicebank packages that support this feature is available on Yamaha's official website.

Cross-Synthesis will morph the primary voice using the secondary voice as an analytical data guide for the adaptation of the first. The cross-synthesis is not simply switching between two voices as the users draws a parameter curve, but is instead increasing the ratio of the secondary voice being combined with the primary one, in a way that is more complex than a simple fade out. The primary voice is used as a "template" for the morphing, which means the vocal combination used in the Cross-Synthesis is not mutually exclusive or interchangeable, thus meaning the rendering and behavior will slightly be different depending on which voicebank is used as the primary or the secondary voice.

  • Example: A Power-Normal setting will not be equal to Normal-Power.

Users can also choose how much the second voice impacts the first, from subtle to extreme. XSY is very similar in concept to how the original VOCALOID engine used recorded analytic data to adapt the VOCALOID engine noise to sound like the vocalist behind the data.

Note that some packages, such as the Megpoid V4 vocals, may come with their respective pairs already set up for XSY within the feature. So when selecting a XSY partner, they will be already registered for selection. Otherwise, an user can normally register their favourite voicebanks pairs for a future easy use.

Also note that besides glitches and updates to old software, XSY cannot occur between a voicebank and itself. In short, while "Tohoku Zunko" VOCALOID3 and Tohoku Zunko V4 can XSY because they are different releases, but Hatsune Miku V4x "Dark" and Hatsune Miku V4x "Dark" are not possible under normal circumstances. In both cases it is often not worth using these combinations for XSY and no point at all in XSY between the same voicebank as are they are identical.

Benefits of XSY?[]

When used effectively, XSY expands the capabilities of the VOCALOID package in use, increasing its overall abilities beyond what any single vocal within the package offers. This is generally split between two major benefits.

Note that while tools in DAWs and vocal editing software can add special vocal effects such as "whisper" effects to a lyric result, the user has much more control as XSY is within VOCALOID itself. There is much less control with vocal filters overall, though they can be applied on top of the XSY result despite this. Vocaloids XSY is instant and more effective , the user has a great deal of control and the software will react, often producing a far more natural sounding result then these filters will produce.

There are two main uses of XSY; "Added Expression" and "Creating New Voices".

Voicebank XSY variants calculations[]

The following calculations give an estimate as to how many major shifts in tone a XSY capable voicebank can produce.

Note that "α" represents the number of vocals that have the option to XSY with the intended vocal being used - this includes its usage both as a primary and secondary vocal.

Equivalent potentially created additional vocals:

α x (α - 1) =

Theoretical total number of vocals is simply the previous result plus the original number of vocals thus the formula is:

α x (α - 1) + α =

This total refers to the complete number of possible vocals that a user has at their disposal as the result of XSY.

For example, the Megpoid V4 release's 10 voicebanks work out as

10 x 9 = 90 additional possible results
10 x 9 + 10 = 100 total theoretical results

Thus Megpoid V4 offers 100 variations to the user starting with the original 10 voicebanks and adding the 90 variations possible to create with them.

The following examples detail the results up to 15;

  • 2 voicebanks; produces 2 variations with XSY, 4 variations altogether including the original vocals
  • 3 voicebanks; produces 6 variations with XSY, 9 variations including the original vocals
  • 4 voicebanks; produces 12 variations with XSY, 16 variations including the original vocals
  • 5 voicebanks; produces 20 variations with XSY, 25 variations including the original vocals
  • 6 voicebanks; produces 30 variations with XSY, 36 variations including the original vocals
  • 7 voicebanks; produces 42 variations with XSY, 49 variations including the original vocals
  • 8 voicebanks; produces 56 variations with XSY, 64 variations including the original vocals
  • 9 voicebanks; produces 72 variations with XSY, 81 variations including the original vocals
  • 10 voicebanks; produces 90 variations with XSY, 100 variations including the original vocals
  • 11 voicebanks; produces 110 variations with XSY, 121 variations including the original vocals
  • 12 voicebanks; produces 132 variations with XSY, 144 variations including the original vocals
  • 13 voicebanks; produces 156 variations with XSY, 169 variations including the original vocals
  • 14 voicebanks; produces 182 variations with XSY, 196 variations including the original vocals
  • 15 voicebanks; produces 210 variations with XSY, 225 variations including the original vocals

Formula notes;

  1. It is worth noting that as XSY is a controllable variable, it is possible to create further variations than this chart displays by mixing more of one vocal with the other. For example, the user may get a different result at 25% as opposed to 50% or 75% of the influence of the secondary vocal.
    1. However, the factors involved with how much the user can get out of two vocals and depends entirely on the two vocals involved. For example, two vocals which are relatively close in tone will give less varied results.
    2. The calculations only account for 3 variables for simplicity sake; 100%-0%, 50%-50% and 0%-100%, as it is not certain how much of a different result two voicebanks can produce between them enough to be taken into account. However it is fully possible the XSY results of 75%-25% or 25%-75% can produce a very different vocal to 50%-%50%, 0%-100% or 100%-0%, thus giving 5 very different results. If the circumstances of the two voicebanks are great enough, the two voicebanks can even create greater then these 5 results such as 10%-90% or 20%-80% for example.
  2. It is sometimes difficult to pinpoint the exact setting that gives the biggest difference. While logic portrays that 50% would be the pinpoint, this is not always the case and it may actually be 55%-45% for example due to a lack of dominate traits in one particular vocal or the other. This results in a producer having to give the weaker vocal a little more strength and the other a little bit less strength to get a fair combination of equal traits of both vocals. However, it is normally a good starting point for experimentation.
  3. Note that the VOCALOID Wikia's calculation will not take into account that some XSY compatible vocals are simply updates that add GWL, so some XSY combinations between older and newer versions of a software may show little response. However, the fact that GWL can change a vocal's tone when combined with other GWL compatible vocals is why they are taken as "different" XSY combination despite being the same vocal. This is why certain vocals such as Gackpoid V4 and V3 Gackpoid are still counted as producing different results.

XSY Groups[]

As mentioned above, Ver.4.3.0 of the VOCALOID4 engine added the "XSY group" assignments, extending the XSY usage beyond its previous capabilities. Certain XSY groups allow for vocals within them to XSY regardless of the character, while previously XSY had been restricted to one character only. This opened doors for very different vocal results from very different VOCALOIDs, with many even capable of producing a result that sounds like neither VOCALOID used in the process.

The same rules for XSY apply to this new group system, so nothing has changed except the potential number of vocals that are compatible.

Arranged by: Seller Tab / Group Name / Singer Name / Product Name

  • nemurion
    • yumemi_nemu
      • VOCALOID4 Library yumemi nemu
      • VOCALOID4 nemurion set
  • nemurion
    • tone_rion
      • VOCALOID4 Library tone rion
      • VOCALOID4 nemurion set
  • Fukase
    • Fukase_J_Normal
    • Fukase_J_Soft
      • VOCALOID4 Library Fukase
    • ARSLOID_Ext_Soft
    • ARSLOID_Ext_Bright
      • VOCALOID4 Library ARSLOID
  • VY1
    • VY1V4_Power
    • VY1V4_Soft
    • VY1V4_Normal
    • VY1V4_Natural
      • VOCALOID4 Library VY1V4
  • galaco
    • galaco_RED
    • galaco_BLUE
      • VOCALOID™3 Library Galaco
  • VY2
    • VY2V3
    • VY2V3_Falsetto
      • VOCALOID™3 Library VY2V3

  • MIKU_V4X
    • MIKU_V4X_Original_EVEC
    • MIKU_V4X_Soft_EVEC
    • MIKU_V4X_Solid_EVEC
    • MIKU_V4X_Sweet
    • MIKU_V4X_Dark
      • VOCALOID4 Hatsune Miku V4X
      • VOCALOID4 Hatsune Miku V4X Bundle
  • MIKU_V4X
    • MIKU_V3_Original
    • MIKU_V3_Soft
    • MIKU_V3_Solid
    • MIKU_V3_Sweet
    • MIKU_V3_Dark
      • VOCALOID™3 Hatsune Miku V3
      • VOCALOID™3 Hatsune Miku V3 Bundle
  • MIKU_V4X
    • MIKU_V3_Light
      • VOCALOID™3 Hatsune Miku V3 Light
    • MIKU_V3_Vivid
      • VOCALOID™3 Hatsune Miku V3 VIVID
  • MIKU_V4_English
    • MIKU_V4_English
      • VOCALOID4 Hatsune Miku V4 English
      • VOCALOID4 Hatsune Miku V4X Bundle
    • MIKU_V3_English
      • VOCALOID™3 Hatsune Miku V3 English
      • VOCALOID™3 Hatsune Miku V3 Bundle
  • LUKA_V4X
    • LUKA_V4X_Soft_EVEC
    • LUKA_V4X_Hard_EVEC
    • LUKA_V4X_Soft
    • LUKA_V4X_Hard
      • VOCALOID4 Megurine Luka V4X
    • LUKA_V4X_ENG_Soft
    • LUKA_V4X_ENG_Hard
      • VOCALOID4 Megurine Luka V4X
  • RIN_V4X
    • RIN_V4X_Power_EVEC
    • RIN_V4X_Warm
    • RIN_V4X_Sweet
  • LEN_V4X
    • LEN_V4X_Power_EVEC
    • LEN_V4X_Cold
    • LEN_V4X_Serious
    • KAITO_V3_Straight
    • KAITO_V3_Soft
    • KAITO_V3_Whisper
    • MEIKO_V3_Straight
    • MEIKO_V3_Power
    • MEIKO_V3_Whisper
    • MEIKO_V3_Dark

    • Megpoid_V4_Native
    • Megpoid_V4_NativeFat
      • VOCALOID4 Library Megpoid V4 Native
      • VOCALOID4 Library Megpoid V4 Complete
    • Megpoid_V4_Power
    • Megpoid_V4_PowerFat
      • VOCALOID4 Library Megpoid V4 Power
      • VOCALOID4 Library Megpoid V4 Complete
    • Megpoid_V4_Whisper
    • Megpoid_V4_SoftWhisper
      • VOCALOID4 Library Megpoid V4 Whisper
      • VOCALOID4 Library Megpoid V4 Complete
    • Megpoid_V4_Adult
    • Megpoid_V4_MellowAdult
      • VOCALOID4 Library Megpoid V4 Adult
      • VOCALOID4 Library Megpoid V4 Complete
    • Megpoid_V4_Sweet
    • Megpoid_V4_NaturalSweet
      • VOCALOID4 Library Megpoid V4 Sweet
      • VOCALOID4 Library Megpoid V4 Complete
      • Gackpoid_V4_Native
        • VOCALOID4 Library Gackpoid Native
        • VOCALOID4 Library Gackpoid Complete
      • Gackpoid_V4_Power
        • VOCALOID4 Library Gackpoid Power
        • VOCALOID4 Library Gackpoid Complete
      • Gackpoid_V4_Whisper
        • VOCALOID4 Library Gackpoid Whisper
        • VOCALOID4 Library Gackpoid Complete
      • UNAV4_Sugar
      • UNAV4_Spicy
        • VOCALOID4 Library Otomachi Una V4
      • kokone
        • VOCALOID™3 Library kokone
      • Chika
        • VOCALOID™3 Library Chika
      • GACHAPOID_V3
        • VOCALOID™3 Library Gachapoid V3
      • Lily_Native
      • Lily_V3
        • VOCALOID™3 Library Lily
      • CUL
        • VOCALOID™3 Library CUL
      • Megpoid_Native
        • VOCALOID™3 Library Megpoid Native
        • VOCALOID™3 Library Megpoid Complete
      • Megpoid_Power
        • VOCALOID™3 Library Megpoid Power
        • VOCALOID™3 Library Megpoid Complete
      • Megpoid_Whisper
        • VOCALOID™3 Library Megpoid Whisper
        • VOCALOID™3 Library Megpoid Complete
      • Megpoid_Adult
        • VOCALOID™3 Library Megpoid Adult
        • VOCALOID™3 Library Megpoid Complete
      • Megpoid_Sweet
        • VOCALOID™3 Library Megpoid Sweet
        • VOCALOID™3 Library Megpoid Complete
      • Gackpoid_Native
        • VOCALOID™3 Library Gackpoid Native
        • VOCALOID™3 Library Gackpoid Complete
      • Gackpoid_Power
        • VOCALOID™3 Library Gackpoid Power
        • VOCALOID™3 Library Gackpoid Complete
      • Gackpoid_Whisper
        • VOCALOID™3 Library Gackpoid Whisper
        • VOCALOID™3 Library Gackpoid Complete

  • AHS
    • Yukari_Jun
      • VOCALOID4 Yuzuki Yukari Jun
    • Yukari_Onn
      • VOCALOID4 Yuzuki Yukari Onn
    • Yukari_Lin
      • VOCALOID4 Yuzuki Yukari Lin
    • AHS
      • Iroha_Natural
        • VOCALOID4 Nekomura Iroha Natural
      • Iroha_Soft
        • VOCALOID4 Nekomura Iroha Soft
    • AHS
      • Kiyoteru_Natural
        • VOCALOID4 Hiyama Kiyoteru Natural
      • Kiyoteru_Rock
        • VOCALOID4 Hiyama Kiyoteru Rock
    • AHS
      • miki_V4_Natural
        • VOCALOID4 miki Natural
    • AHS
      • Zunko_Natural
        • VOCALOID4 Tohoku Zunko Natural
    • AHS
      • MacneNana_V4_Natural
        • VOCALOID4 Macne Nana Natural
      • MacneNana_Petit
        • VOCALOID4 Macne Nana Petit
    • AHS
      • Yukari
        • VOCALOID™3 Yuzuki Yukari
    • AHS
      • Zunko
        • VOCALOID™3 Tohoku Zunko
        • VOCALOID™4 Tohoku Zunko_Natural
    • AHS English
      • Macne Nana
        • VOCALOID™3 Macne Nana English VOCALOID3'
      • Macne Nana VOCALOID4
        • VOCALOID™4 Macne Nana English VOCALOID4'

  • IA
    • IA
    • IA_ROCKS

  • YuezhengLongya
    • YuezhengLongya_Chun
    • YuezhengLongya_Ya
      • VOCALOID™4 Library Yuezheng Longya
    • Luotianyi_CHN
      • Luotianyi_CHN_Ning
      • Luotianyi_CHN_Meng
        • VOCALOID™4 Library Luo Tianyi V4
    • Luotianyi_JPN
      • Luotianyi_JPN_Normal
      • Luotianyi_JPN_Sweet
        • VOCALOID™4 Library Luo Tianyi V4 Japanese

  • XinhuaJP
    • XinhuaJP_Natural
    • XinhuaJP_Power
      • VOCALOID™4 Library Xin Hua Japanese

  • view • edit

    General Limitations[]

    XSY is far from perfect and has a number of issues with it, which often result in a low quality outcome. XSY can produce unpredictable or unexpected results when used. This is particularly true when the two vocals used for the feature have significant differences, or when the involved vocals were created without XSY in mind. This occurred with many of the VOCALOID3's voicebanks that gained XSY when they were imported into VOCALOID4.

    • Example: In the Megpoid V4 package, wherein while the respective pairs of vocals (Native and NativeFat for example) offer a moderate XSY result, XSY between the other 8 vocals offers more unpredictable results (Power and Sweet for example). Comparing also to the V3 Megpoid vocal, the V3 Megpoid vocals use of XSY is not so effective, as the voices were not originally intended for the function. The V3 Megpoid package produces considerably lower quality results when compared to V4.

    As with the "GWL" function, this feature may prove very limiting to vocals that are too similar. A common complaint with Megurine Luka V4X's English vocals "Straight" and "Soft" is how XSY barely impacts either vocal.

    Due to how different languages work, a producer using VOCALOID with multiple voicebanks across languages may notice an inequality in the results in every language. Some languages require blending of sounds and others precision, these impact how sounds are expressed in different languages. In theory a Japanese "Power + Soft" may not be similar at all to a Chinese "Power + Soft" result even if they're voiced by the same provider.

    In extreme cases, the user would have been better off not using XSY then using XSY. This is simply because the flaws of XSY, can often outweigh the strengths of XSY, unless the lower quality result is one the user sought out to begin with from XSY. There are also times in which the XSY results flaws impact the user's song, simply because the user will not even be considering things such as quality of the voicebanks involved, the recommended tempo and vocal ranges, or other such things. As with normal VOCALOID results, the user may simply use XSY for amusement purposes, mixing two voicebanks just to hear what they produce, with no consideration of the consequences of mixing those two voicebanks.

    Bugs and why they appear[]

    XSY's differences are worked out by use of mathematical equations, to work out the differences between the primary and secondary vocals and alter the wavelength of the primary vocal in accordance. The calculations don't work particularly well when the voicebanks have libraries built entirely different to each other as it was not designed to handle this and it is can be impossible to get good quality results.Furthermore, if a sound is missing in the secondary library then the XSY function cannot reference it for XSY use since it doesn't exist. This is why VOCALOID3 and VOCALOID4 XSY is not always feasible, nor XSY between languages and multiple VOCALOIDs. This is also why features such as E.V.E.C. may cause issues with XSY.

    The result can be major technical issues and is among the most major flaws of XSY, as it is by far the most unpredictable issue with XSY and impacts quality greatly. It is also very easy not to notice the failures of XSY particularly with languages the user is not familiar with.

    Although problems are likely to occur in both the use of XSY for "Added Expression" and "Creating New Voices", some are far more likelly to come out in one or the other and be an issue for that particular use of XSY.

    Added Expression[]

    This was the original intention of the XSY funtion and was intended to take advantage of multiple voicebank releases. XSY allowed these voicebanks to more naturally switch during song production and help the Producer slide smoothly between voicebanks, removing some of the roughness of switching voicebanks mid-song.

    While the result can be used throughout the entirety of a song, multiple XSYs creates a more realistic vocal performance. The vocal can go from a normal "whisper" to singing a high ballad to express a joyful happiness, made achievable by a "power-whisper" mix. In the same song, a "dark-whisper" mix can create a sad tone. In the case of the Megpoid V4 five pairs, they were intended to use XSY to allow the user to ease from one vocal to another, allowing for easier/smoother switching between the two vocals in each intended pairing. This gives the user the ability to add emphasis onto certain parts of a song, such as adding power to a high note or softness to a low, giving the impression of a singer changing tone mid-song much easier.

    For example, VY1v4s vocal has several options to go from "Soft" to "Power", with or without the use of "Natural" to more realistically ease the vocal between notes and with or without the use of the middle "Normal" voicebank.

    1. "Soft" → XSY ("Soft" x "Power") → "Power"
    2. "Soft" → XSY ("Soft" x "Normal") → XSY ("Normal" x "Power") → "Power"
    3. "Soft" → XSY ("Soft" x "Natural") → XSY ("Natural" x "Power") → "Power"
    4. "Soft" → XSY ("Soft" x "Normal") → "Normal" → XSY ("Normal" x "Power") → "Power"
    5. "Soft" → XSY ("Soft" x "Natural") → "Normal" → XSY ("Natural" x "Power") → "Power"
    6. "Soft" → XSY ("Soft" x "Natural") → "Natural" → XSY ("Natural" x "Power") → "Power"
    7. "Soft" → XSY ("Soft" x "Natural") → XSY ("Natural" x "Normal") → "Normal" → XSY ("Normal" x "Natural") → XSY ("Natural" x "Power") → "Power"

    Even without switching voicebanks "Soft" and "Power", the use of XSY on just a single, main vocal would be enough to improve the VY1 "Normal" voice alone using such a path;

    1. XSY ("Soft" x "Normal") ⇄ "Normal" ⇄ XSY ("Normal" x "Power")


    • In the case of VY1v4, there is never a case of needing to using both "Natural" and "Normal" as non-XSY results together, either one or the other will be used due to the abilities of the voicebanks. "Natural" has the ability to allow VY1v4 to skip "Normal" as the middle voicebank. Users need to pay attention to the voicebank roles and figure out how they play their part in XSY at times as it is not always obivous.
    • When transistioning from one voicebank to another vocal a User should use for the Primary vocal their starting vocal, while the secondary vocal is normally the vocal that is being transistioned into. For example if you start at "Soft" then the primary vocal is "Soft" and if your going into "Normal" your secondary vocal therefore is "Normal". The result is that the pairing is "Soft" x "Normal" not "Normal" x "Soft". This is important as previously mentioned on this page, the two pairings are not identical to each other and it can matter which is Primary and which is Secondary in how effective any expression pathing is. There may be exceptions at times, but it is important to switch the primary and secondary vocals if going back from "Power" to "Soft" in the above example. So "Soft" x "Natural" would become "Natural" x "Soft" instead.

    In terms of how this applies to lyrics, if the lyrics are "My very best friend is my bestest friend" go from a lower note to higher, then the transition may have the following voicebanks used examples such as this;

    1. My Very best ("Soft") friend is my ("Soft" x "Power") bestest friend ("Power")
    2. My very ("Soft") best friend ("Soft" x "Natural") is my ("Natural" x "Power") bestest friend ("Power")

    The number of steps a producer can take in switching between voicebanks using the aid of XSY depends on the space between lyrics and how much room for each step. There is down issues with extreme transistions Such as "Soft" → XSY ("Soft" x "Power") → "Power" and warping may occur in the XSY pairring, this is caused by the fact the two are very different, thus the need normally for "Natural" or "Normal" to help soften the transistion where possible. Both examples skip VY1v4 "Normal" due to there being a lack of room and both would make the transistion more dramatic as a result but the first is much more dramatic then the second due to the fact is has one less stage in between having 3 steps in the transistion from low to high instead of 4. This of course will impact the vocal singing and how it sound.

    However, without XSY the vocal result would go:

    • My Very best ("Soft") friend is my ("Normal") bestest friend ("Power")

    The vocal will sound more jaggered, it will become much more noticable to hear when "Soft", "Normal" and "Power" begin and end due tot he differences in the vocal.

    Thus the function was to support VOCALOIDs who had extra voicebanks and allow them to sound more realistic. The VOCALOID could switch singing roles, tone or style mid song effortlessly, even if the switch was sudden, and with barely a notice giving added expression to a voicebank that it did not previously have. XSY would help aid this switch or help just for a handful of words that needed particular adaptation from the normal vocal traits held by the primary vocal. This was the primary function of XSY and was the selling point for it when it first appeared and releases like Megpoid V4 focused on this.

    While this was able to be created without the use of XSY, the methods were not flawless and often involved tricks such as fading out one vocal and fading in another. This resulted in a chance for overlapping sounds and thus XSY resolved the issue by allowing the user to switch and produce a smooth transaction between vocals with no chance to get overlapping audio. And since it was done within VOCALOID, the VOCALOID engine would react naturally to the change and take it into account on top of all other processes. With a few of the other possibilities may involve other non-VOCALOID software which VOCALOID would not take into account leading to rougher results.

    Note while XSY currently does not exist in VOCALOID5, the "Style/Colours" function that has replaced it can at times bring out more stable results of the vocal and even provide better expression. It serves much of the purpose of the original intention of XSY, which was to add variant traits to a vocal, but functions without the need of a second VOCALOID and works relatively well with most VOCALOID3 and VOCALOID4 vocals. Even when it is outclassed by XSY, it still provides a more stable result then XSY.

    The VOCALOIDs releases that best reflect this use are VY1v4 and Megpoid V4 as both were made with this in mind.

    Realism and XSY[]

    Realism is a major problem for this particular use of the XSY function. When people speak about realism they normally refer to a VOCALOIDs ability to sound like an actual human being is singing, known as the "uncanny valley effect". A brand new problem comes from XSY, while is not a subject of a glitch, is that it is a combination of two vocals with often very different traits and behaviours which get combined into a brand new vocal.

    As VOCALOID is programmed to simulate via synthesizing some level of mimicry of the human vocal, the new vocal can still behave in a manner that can be considered "realistic". This is seen even in non-XSY realistic results with vocals that are not considered realistic themselves. This is because VOCALOID itself is trying to mimic the way the human voice produces sound and has various mathematical calculations behind it which impacts all VOCALOIDs, with their vocal traits being a major factor in separating them.

    If correctly used, the uncanny valley effect can be enhanced by using XSY for this reason when combined with normal non-XSY results, but this tends to work only when XSY is used as a expression enhancer. In this case, use of XSY can often be minor for a song, with less chances of XSYs unrealistic results being allowed to show, since this is not overusing the function. The XSY voice will appear not much different to the normal VOCALOID result, which already has a degree of engine noise, as it has extra engine sounds.

    With this in mind, it would appear to make no sense that a tool that can enhance realism can be itself unrealistic. Nor any less realistic then normal VOCALOID results. It can be hard therefore to explain why there is a lack of realism when it comes to the use of XSY. To put it simply, the new XSY vocal result never existed at all and is created by VOCALOID itself. With no basis in reality of its own, it is a product of mathematical calculations and the result vocal can often have far more synthesized engine noise then the two vocals going into it.

    There is also the issue of a unpredictable result. When the user creates, for example, "power-whisper" with XSY, there is no telling how the result will turn out. Traits needed for a power-whisper to work may not be passed into the new result from its "power" and "whisper" vocals at all. The result is a rough vocal with oddities that would have not naturally been produced by a vocalist recreating the same type of vocal. With a real human being providing the results, muscles within the throat react in a certain way and air flow is controlled in an entirely organic manner. The singer can learn odd tricks that make sounds sharper or clearer, which will then be captured for use in a voicebank. Likewise, a total weakness inn the human voice will be captured for the recording, which will also be present in a "power whisper" voicebank.

    XSY has no way of compensating for such things, as the VOCALOID4 technology itself did not have the ability to "learn" what does and doesn't work like a real singer. It simply uses mathematics to take traits from two vocals to make a new vocal and it cannot adapt its results to improve odd areas in the process. If there is a weakness needed to be in a particular sound combination or a better method of sound creation able to be produced for better result, XSY simply cannot add to the vocal, it can only draw from existing results. Anything new added is always a result of a glitch or other issue related to the combination of two voicebanks in use and nothing new is added that wasn't there before.

    To simplify things - XSY cannot produce a realistic result because it lacks true authenticity. If the provider themselves were to produce a voicebank that was a power-whisper, compared to the XSY version, it will be more closer to how the provider sounds and much higher quality overall. Since XSY is entirely mathematical calculation based, there is nothing that can be done to change this in terms of the available versions of the VOCALOID4 engine and users have to work around this.

    This is the biggest issue when using XSY as a smooth transition between results.

    Creating New voices[]

    As a result of how XSY functions, a unexpected result came from XSY. As seen in the Megpoid V4 package, XSY has the potential to create many different results. By mixing vocals with more extreme results, the combination of the two creates an effect that mimics having an entirely different voicebank. XSY mixes in traits of the other vocal to create an entirely new sound. By mixing a "whisper" type vocal with a "power" type, one can achieve a "power-whisper" result. A VOCALOID with just two vocals can achieve the equivalent of a third and fourth voicebank via the XSY function.[3]

    VOCALOIDs having access to even just one more additional vocal expansion library and XSY will see their overall tone capabilities doubled. This makes them a much more attractive package than VOCALOIDs with just a single voicebank, as the potential for more vocal tones was far greater. For example, while ZOLA PROJECT has 3 voicebanks within its package, the 3 vocalists cannot XSY. Compared to Yuzuki Yukari V4, who can XSY her 3 voicebanks, they are rather limiting as a result. With XSY, Yukari can extend the number of potential tones, opening up 6 more potential tones for use and bringing her tone count to a total of 9 overall. So in reality with XSY, when you purchase Yukari V4, you are not buying 3 voicebanks, you are buying 9. In contrast, without XSY, Zola will only ever be able to offer the basic 3 voicebanks held by its 3 singers and this can be seen as less valable as a package.

    From Ver.4.3.0 of the VOCALOID4 engine "groups" were added, allowing for the first time a certain number of vocals to XSY between them that were never able to do so before. It permitted vocals to XSY between multiple characters.[4][5]

    In June 2017, AH Software also created a group for English voicebanks, allowing Macne Nana's original and VOCALOID4 vocals to XSY with each other for the first time and any potential future English vocals. Since the original English vocal was in the "Yamaha" group of VOCALOIDs originally, this was became also the first example of cross company XSY.

    A benefit of this expanded function is that multiple characters can be used to support the main VOCALOID. For example, Megpoid V4's 10 voicebanks can be switched around to act as "tone controlling", adding a slightly different tone to another vocal depending on if the user wants their primary vocal to be altered. This is possible due to all 10 having the same vocal range and tempo. Others such as kokone can be used to bring a vocal to a more "falsetto" tone of voice. Then there are vocals like Macne Nana which can be used to give a vocal support during faster songs, due to her stability during high tempos.

    In many combinations, an unique vocal that would never have existed without XSY can be obtained; this vocal was not possible without XSY and may never have been capable by either of the VOCALOIDs providers. This is especially true for vocals in the XSY groups "Internet" and "AHSoftware". The results may seem quite unnatural and are the only method of otherwise obtaining such a result. However, of the two methods of using XSY, this is the most likely method to show up the weaknesses of XSY. This is simply because to use a voicebank for an entire new vocal and then proceed to use said new vocal for an entire song will lead to the most chances of weaknesses to show compared to simple changse of expression usage with XSY. This is more true for a user who has not much skill in hiding faults or a lack of experience with either the function or combination of two vocals, on top of any weaknesses they have for music making.

    In addition some vocals can be created that are missing entirely. For example there is a notable lack of masculine vocals in VOCALOID in general but both of the Internet and AH Software XSY groups have at least 1 true male vocal. It is possible to use Camui Gackpo and Hiyama Kiyoteru to increase the number of male vocals that users have access to by using their vocals to add masculine traits to female vocals. With the GEN parameter also in use, the producer can more then make up for the lack of masculine VOCALOID vocals.

    Though this was not a major selling point onwards, from the introduction of XSY groups it became a much more highlighted benefit. The removal of XSY in VOCALOID5 impacted producers who used this function for this particular reason and was seen as a negative point.

    However, the use of XSY for the creation of new vocals, is far more likely to show up XSYs flaws then using it for "Added EXpression".

    Exaggeration/noticeably of a flaw;[]

    This can impact a small collection of sounds, tone or pitch.

    One of the more common issues with XSY is how flaws are handled. If both vocals were drawn from the same set of (bad) data, or simply both ended up with a similar/same glitch (both in the case of vocals within a same release or different releases) then this trait is ignored by the XSY function. As a result, while other traits can be lessened as the vocal is adjusted to either vocal dominance the shared trait remains more untouched. Without having the other traits to hide it, the flaw becomes more noticeable even though nothing has changed it. This can be demonstrated in the Megurine Luka V4X English vocals "Straight" and "Soft", which share identical flaws as they draw their vocals from similar data.

    On the other hand, if the flaw exists in the second vocal but is stronger a similar issue can occur. In this case, the other traits do not always has to be weakened in addition as the flaw itself is the impacted result. The second vocals impact makes the primary vocals flaw be pulled to match its strength resulting in the problem being made worst. This is examples in Arsloid "Bright" and "Soft", which often cause the original Arsloid vocal to have exaggerated flaws due to their own lack of content to soften the flaws during the XSY process.

    Both are a result of matching similar vocal types, but one that have contrasting results with each other, such as those made for single-character XSY.

    Both cases are fixable if the user is familiar to both vocals, the one creating the problem or XSY pairing. It is often a quick or minor fix and requires tweaking on certain areas of the VSQX file. This can also be fixed with the right filters in an external sound editing software.

    Glitches where there were none;[]

    Sometimes, it is a simple case of a glitch exists in the secondary vocal, which when used for XSY, causes that glitch to appear in the primary vocal or because that pairing causes a glitch to form that was in neither. Incorrect phonemes, unlinked sounds, and random crunches all can be added to a vocal that didn't have them before. This is also notable to occur between a V3 and V4 vocal such as the versions for Gackpoid or Megpoid. While the V4 version may have fixed the flaw XSY with a V3 version may re-add it back in.

    Other times, due to the extreme results of the secondary vocal, this results into a lower quality sound overall with new glitches added altogether that were in neither vocal. This mainly appears when using cross-character XSY as it is uncertain what impact a second vocal like CUL, kokone or Gachapoid as these vocals tend to have great impact on other vocals. The glitch is a flaw added by the XSY process itself.

    These particular issues can be quite of a problem for inexperienced producers and veterans alike. In both cases, it can go unnoticeable as it has already been noted some producers fail to notice glitches in non-XSY results, especially if they are short or barely noticeable. It adds a poor quality sound to a result that otherwise would have been high quality. Often, since the VOCALOID software created the glitch, it must be edited within VOCALOID itself and cannot always be fixed by additional software. The issue almost always lies with the secondary vocal used and the fault lies with the second vocal used in the pairing and switching vocals or lessening the strength of the second vocal are options to fix this problem.

    Change of Range/Tempo[]

    Each vocal has its own recommended range and tempo of which clearly marks where the VOCALOID results are best for each vocal. This is applicable even when the studio doesn't make the ranges known, yet this impacts XSY at times. Users doesn't always understand the reason for recommended ranges existing in non-XSY results to begin with, therefore this can change very little to a producer's approach to their VOCALOID handling. They may never even notice a change of range at all in their time using the function for this reason and it can be a minor issue at best.

    The two recommended ranges (vocal notes and tempo) are determined by the layers of pitch per VOCALOID both stationary and articulation based as a result of the combined samples of a voicebank library. XSY cross references both voicebanks layers of pitch sound to calculate a new result. A typical VOCALOID vocal in Japanese and English has 2-3 layers and the combined XSY result is then based on a 4-6 layer vocal instead. This can end up in both a good and bad impact on the sound when completed. The new voice has a new tempo and vocal range as a result that is different to the original two vocals ranges.

    There are two things to note in regards to this;

    1. The new vocal result is strongest when the layers line up with each other overall with similar range, creating a solid foundation of which all layers support each other equally. This produces a more HQ result, however, the downside is if the vocals are too close no much difference may be noticed.
    2. A vocal which focuses only on a single vocal range (example; lower octaves), can benefit when combined with a vocal that focuses only on higher ranges or vice versa for the other way around. The new secondarily vocals addition layers offer addition extensive support, allowing the primarily vocal to go past its previous recommended range. However, this more likely is to cause previous mentioned glitches appearing due to the use of two very different vocals, causing a VOCALOID to produce a much less LQ result.

    An example of this would apply to VY1v4; "Power", "Soft" and "Normal" all can have this effect on each other, while "Natural" will have this effect on them. On the other hand, if the producer uses "Natural" as the primarily vocal, then the other 3 vocals become support for it. They line up with all aspects of "Natural" and will strength and adjust certain aspects of the vocal, for example "Power" will strength its upper ranges while "soft" loosens them.

    One last thing to note about the change of vocal range is that when two vocals that have no overlap at all are XSY, the middle unsupported range becomes unpredictable. Though rare to create as most VOCALOIDs in each XSY group have at least a few keys overlapping each other, VOCALOID none the less is completely improvising the results in those rare occasions it occurs. The larger the gap between two vocal ranges in the middle, the more unpredictable this range is. The centre of this unstable vocal range can become a serious weak point producing LQ results in large gaps. Further more, it can have extreme results on the two ranges covered by either voicebanks since the calculations between the two are greatly exaggerated. In such rare cases this does occur, it is recommended that users be more subtle with how much the secondary vocal influences the primary vocal to limit the issue. This has been witnessed especially in unintentional XSY pairing, such as those produced as a result of illegal XSY modding.

    Unbalanced Traits[]

    The new XSY vocal is a combination of the traits of two voicebanks, but this leads to issues of its own. While mixing two vocals have unpredicted roles, there have still been some predictable things that have come from XSY. These are considered often relatively minor issues since they can be relatively easy to fix or just bring up the issue of the usefulness of the function.

    Roles of Voices[]

    One of the issues which was also noted about XSY is that there is a cutoff point with either vocal in regards to the amount of vocal traits. It leads to the question "Why use IA ROCKS to give the original IA vocal more abilities to handle rock music when IA ROCKS already does that?". Even when it comes to VOCALOIDs not related in the XSY groups, there can be points wherein the question arises if it was worth it to XSY CUL into GUMI's vocal if CUL could do the result to begin with, or why using the inferior vocal for the same role when the superior vocal was already better doing that role.

    This is a question that can be applied to non-XSY results, as it is common to use VOCALOIDs for musical roles they are not best at, but it becomes a taboo question with the XSY function. This is because it only uses two vocals for the process who already have their own set of strengths and weaknesses. It can only draw from those two and the combination result may not even be as good as the original two vocals. The original vocals can be better, or just overall more useful, then any XSY combination with the same vocals and render the XSY pairing useless for any roles they would have done. Thus, each new XSY combination has to be treated as a separate vocal in its own right and treated as though it is a brand new VOCALOID voicebank, sometimes having to cater the song with the XSY pairing in mind.

    XSY used as blending tool also is unaffected by this and it mostly an issue for those who use XSY for new voice creation. Songs made with entirely 1 pairing with no switching mid-song could take this into consideration though it is less important to songs that use multiple XSY pairing changes throughout. When using XSY, it is generally best to note what the new vocal is setting out to achieve that the originals could not. If either vocal can fulfil the role better, then XSY is not needed at all, as the new vocal will often be no where near as good as they are at the result, due to their traits and quality.

    Loss of Traits[]

    For some vocals, they gain clarity because of a relied upon single or more trait such as a strong attack, powerful/clear tones or high quality recorded samples, traits which are common to "power" types. However, mixing these vocals with other very different traits can remove the very reason for their clarity.Part of the problem is that the XSY tool can "average" traits at times, finding the mean between two vocals and producing a vocal that lacks the strengths of either voicebank that went into it.

    "Soft" vocals have been known to impact clarity, due to their looser pronunciations, so mixing with this vocal can cause the sounds on the primary vocal to loosen and mimic the traits of the soft vocal. This can make the resulting primary vocal appear to mumble more as it weakens consonants which are often a clarity controller, allowing vowels to have more impact on the vocal.

    This issue, appears when mixing different contrasting vocal types and is applicable to any VOCALOID that can use XSY in any form as the already aforementioned Power/Soft XSY mix. It impacts the entire vocal XSY result. The issue is possible to fix both inside VOCALOID or in a sound editing software and is not too different to normal VOCALOID editing. Note the purpose of mixing in a softer vocal type is generally not to maintain clarity, but to add more expression to a vocal lacking in that capabilities.

    Just as clarity can be lost, expression can be too. While XSY is primarily intended for expression creation, certain vocals can see their natural expression impacted.

    A "Solid" vocal is a vocal type that often shares traits with types such as "Power" and vocals with this typing usually have strong consonants and solid tones. These give clear and precise tones for clarity, but lack expression since there is often a lack of looseness about them resulting in a lack of breath elements. When used in a secondary role to impact a "soft" vocal type, this gives clarity to the softer vocal at times, but with that clarity the user may see a loss of the soft vocals expression traits.

    As with clarity, this is a result of mixing two contrasting results and is just the opposite situation of the loss of clarity. So the causes are the same and often the fixes similar. In fact, the same thing can happen with any number of traits held by voicebanks in this situation. At times, the impact of XSY can be so great that the result has no speciality what so ever compared to its original 2 voicebanks, thus the result is a voice that specialises in no real strengths and no real weaknesses at all in a bad way making the voice uninteresting, though this varies depending just which two voicebanks went into it.

    Software Agreements[]

    Due to the licensing agreements between VOCALOIDs and studios, VOCALOIDs generally are not open for XSY between other VOCALOIDs or VOCALOIDs of other studios. However, XSY mods do exist which alter the software to allow forbidden XSY combinations. The VOCALOID licensing agreements do not cover these cases and users should be aware of this when installing any XSY mods. One example is for Arsloid, wherein modding allows for "Bright" and "Soft" to XSY together - a combination not normally allowed at all.

    Note; due to the fact this does not comply with the Yamaha licensing agreements, the VOCALOID wiki is unable to cover known cases of XSY modding and songs using them will be removed, as well as possible URLs to them, when found.



    VOCALOID 4 新機能 「クロスシンセシス」 - Cross Synthesis - YouTube
    Hiyama Kiyoteru Rock + Nekomura Iroha Soft SoundCloud
    Yuzuki Yukari Jun + Tohoku Zunko Natural SoundCloud
    Otomachi Una V4 Sugar + Megpoid V4 Sweet SoundCloud
    Otomachi Una V4 Spicy + Megpoid V4 Power SoundCloud
    Megpoid V4 Adult + Lily V3 SoundCloud
    Gackpoid V4 Native + Gachapoid V3 SoundCloud
    Tohoku Zunko Natural + Hiyama Kiyoteru Natural SoundCloud
    Tohoku Zunko Natural + miki Natural SoundCloud
    Tohoku Zunko Natural + Yuzuki Yukari Jun SoundCloud
    Macne Nana Natural + Yuzuki Yukari Lin SoundCloud
    Macne Nana Natural + Tohoku Zunko Natural SoundCloud
    Macne Nana Petit + Nekomura Iroha Soft SoundCloud
    Yumemi Nemu + Tone Rion V4 SoundCloud