Cross-Synthesis

Cross Synthesis (クロス-シーサセス), often shortened in reference to "XSY", is a parameter that allows a voicebank to blend into another voicebank in a gradual and progressive way. It is only available with the full version of the software.

VOCALOID5 was released without XSY support and it will not show up in this version at all.

How to use
First, the users must access the Cross-Synthesis Web browser through the Singer Editor, and can assign a Primary Voice and a Secondary Voice to be used in Cross-Synthesis. This feature works with a selection of VOCALOID3 voicebanks when imported into the VOCALOID4 engine, however not all products are compatible. Most Vocaloid are not able to be registered for XSY and this is particularly true for single voicebank that are not part of a XSY group, so releases such as Yanhe, Yohioloid and Flower cannot use XSY. Therefore, if users wish to make use of XSY some research and planning when making purchases may be required to work out the best Voicebanks to invest in.

A chart of the voicebank packages that support this feature is available on Yamaha's official website.

Cross-Synthesis will morph the Primary Voice using the Secondary Voice as an analytical data guide for the adaptation of the first. The cross-synthesis is not simply switching between two voices as the users draws a parameter curve, but is instead increasing the ratio of the Secondary Voice being combined with the Primary Voice, in a way more complex than a simple fade out. The Primary Voice is used as a "template" for the morphing, it means the vocal combination used in the Cross-Synthesis is not mutually exclusive or interchangeable, thus meaning the rendering and behavior will be slightly different depending on which voicebank is used as primary or secondary voice.
 * Example: A Power-Normal setting will not be equal to Normal-Power.

Users can also choose how much the second voice impacts the first, from subtle to extreme. XSY is very similar in concept to how the original VOCALOID engine used recorded analytic data to adapt the VOCALOID engine noise to sound like the vocalist behind the data.

Note that some packages, such as the Megpoid V4 vocals, may come with their respective pairs already set up for XSY within the feature. So when selecting a XSY partner, they will be already registered for selection. Otherwise, an user can normally register their favourite voicebanks pairs for a future easy use.

Also note that besides glitches and updates to old software, XSY cannot occur between a voicebank and itself. In short, while "Tohoku Zunko" Vocaloid3 and Tohoku Zunko v4 can XSY because they are different releases, Hatsune Miku V4x "Dark" and Hatsune Miku V4x "Dark" are not possible under normal circumstances. In both cases it is often not worth using these combinations for XSY and no point at all in XSY between the same voicebank as are they are identical.

Benefits of XSY?
When used effectively, XSY expands the capabilities of the VOCALOID package in use, increasing its overall abilities beyond what any single vocal within the package offers.

As seen in the Megpoid V4 package, XSY has the potential to create many different results. By mixing vocals with more extreme results, the combination of the two creates an effect that mimics having an entirely different voicebank. XSY mixes in traits of the other vocal to create an entirely new sound. By mixing a "whisper" type vocal with a "power" type, one can achieve a "power-whisper" result. A VOCALOID with just two vocals can achieve the equivalent of a third and fourth voicebank via the XSY function. So for VOCALOIDs with access to even just one more additional vocal expansion library and XSY will see their overall tone capabilities doubled. This makes them a much more attractive package than VOCALOIDs with just a single voicebank.

While the result can be used throughout the entirety of a song, multiple XSYs creates a more realistic vocal performance. The vocal can go from a normal "whisper" to singing a high ballad to express a joyful happiness, made achievable by a "power-whisper" mix. In the same song, a "dark-whisper" mix can create a sad tone. In the case of the Megpoid V4 five pairs, they were intended to use XSY to allow the user to ease from one vocal to another, allowing for easier/smoother switching between the two vocals in each intended pairing.

From Ver.4.3.0 of the VOCALOID4 engine "groups" were added, which allowed for the first time a certain number of vocals to XSY between them that were never able to do so before. It allowed vocals to XSY between multiple characters.

A benefit of this expanded function is that multiple characters can be used to support the main VOCALOID. For example, Megpoid V4's 10 voicebanks can be switched around to act as "tone controlling", adding a slightly different tone to another vocal depending on if the user wants their primary vocal to be altered. This is possible due to all 10 having the same vocal range and tempo. Others such as Kokone can be used to bring a vocal to a more "falsetto" tone of voice. Then there are vocals like Macne Nana which can be used to give a vocal support during faster songs, due to her stability during high tempos.

In June 2017 Ah Software also created a group for English voicbanks, allowing Macne Nana's original and VOCALOID4 vocals to XSY with each other for the first time and any potential future English vocals. Since the original English vocal was in the "Yamaha" group of Vocaloids originally, this was became also the first example of cross company XSY.

Voicebank XSY variants calculations
The following calculations give an estimate as to how many major shifts in tone a XSY capable voicebank can produce.

Note that "α" represents the number of vocals that have the option to XSY with the intended vocal being used - this includes its usage both as a primary and secondary vocal.

Equivalent potentially created additional vocals:
 * α x (α - 1) = 

Theoretical total number of vocals is simply the previous result plus the original number of vocals thus the formula is:


 * α x (α - 1) + α =

For example, the Megpoid V4 release's 10 voicebanks work out as


 * 10 x 9 = 90 additional possible results


 * 10 x 9 + 10 = 100 total theoretical results

The following examples detail the results up to 15;


 * 2 voicebanks; produces 2 variations with XSY, 4 variations altogether including the original vocals
 * 3 voicebanks; produces 6 variations with XSY, 9 variations including the original vocals
 * 4 voicebanks; produces 12 variations with XSY, 16 variations including the original vocals
 * 5 voicebanks; produces 20 variations with XSY, 25 variations including the original vocals
 * 6 voicebanks; produces 30 variations with XSY, 36 variations including the original vocals
 * 7 voicebanks; produces 42 variations with XSY, 49 variations including the original vocals
 * 8 voicebanks; produces 56 variations with XSY, 64 variations including the original vocals
 * 9 voicebanks; produces 72 variations with XSY, 81 variations including the original vocals
 * 10 voicebanks; produces 90 variations with XSY, 100 variations including the original vocals
 * 11 voicebanks; produces 110 variations with XSY, 121 variations including the original vocals
 * 12 voicebanks; produces 132 variations with XSY, 144 variations including the original vocals
 * 13 voicebanks; produces 156 variations with XSY, 169 variations including the original vocals
 * 14 voicebanks; produces 182 variations with XSY, 196 variations including the original vocals
 * 15 voicebanks; produces 210 variations with XSY, 225 variations including the original vocals

Formula notes;
 * 1) It is worth noting that as XSY is a controllable variable, it is possible to create further variations than this chart displays by mixing more of one vocal with the other. For example, you may get a different result at 25% as opposed to 50% or 75% of the influence of the secondary vocal.
 * 2) However, the factors involved with how much you can get out of two vocals is dependent entirely on the two vocals involved. For example, two vocals which are relatively close in tone will give less varied results.
 * 3) The calculations only account for 3 variables for simplicities sake; 100%-0%, 50%-50% and 0%-100%, as it is not certain how much of a different result two voicebanks can produce between them enough to be taken into account. However it is fully possible the XSY results of 75%-25% or 25%-75% can produce a very different vocal to 50%-%50%, 0%-100% or 100%-0%, thus giving 5 very different results. If the circumstances of the two voicebanks are great enough, the two voicebanks can even create greater then these 5 results such as 10%-90% or 20%-80% for example.
 * 4) It is sometimes difficult to pinpoint the exact setting that gives the biggest difference. While logic portrays that 50% would be the pinpoint, this is not always the case and it may actually be 55%-45% for example due to a lack of dominate traits in one particular vocal or the other. This results in you having to give the weaker vocal a little more strength and the other a little bit less strength to get a fair combination of equal traits of both vocals. However, it is normally a good starting point for experimentation.
 * 5) Note that the Vocaloid wikia's calculation will not take into account that some XSY compatible vocals are simply updates that add GWL, so some XSY combinations between older and newer versions of a software may show little response. However, the fact that GWL can change a vocal's tone when combined with other GWL compatible vocals is why they are taken as "different" XSY combination despite being the same vocal. This is why certain vocals such as Gackpoid V4 and V3 Gackpoid are still counted as producing different results.

XSY Groups
As mentioned above, Ver.4.3.0 of the VOCALOID4 engine added the "XSY group" assignments, extending the XSY usage beyond its previous capabilities. Certain XSY groups allow for vocals within them to XSY regardless of the character, while previously XSY had been restricted to one character only. This opened doors for very different vocal results from very different VOCALOIDs, with many even capable of producing a result that sounds like neither VOCALOID used in the process.

The same rules for XSY apply to this new group system, so nothing has changed except the potential number of vocals that are compatible.

Limitations
XSY is far from perfect. As with the "GWL" function, this feature may prove very limiting to vocals that are too similar. A common complaint with Megurine Luka V4X's English vocals "Straight" and "Soft" is how XSY barely impacts either vocal.

XSY can produce unpredictable or unexpected results when used. This is particularly true when the two vocals used for the feature have significant differences, or when the involved vocals were created without XSY in mind. This occurred with many of the VOCALOID3's voicebanks that gained XSY when they were imported into VOCALOID4.


 * Example: In the Megpoid V4 package, wherein while the respective pairs of vocals (Native and NativeFat for example) offer a moderate XSY result, XSY between the other 8 vocals offers more unpredictable results (Power and Sweet for example). Comparing also to the V3 Megpoid vocal, the V3 Megpoid vocals use of XSY is not so effective, as the voices were not originally intended for the function. The V3 Megpoid package produces considerably lower quality results when compared to V4.

Due to how different languages work, a Vocaloids with multiple voicebanks across languages may find their results are not equal in every language. Some languages require blending of sounds and others precision. In theory a Japanese "Power + Soft" may not be similar or identical to a Chinese "Power + Soft" result even if their voiced by the same provider.

Common Flaws
Some common issues included;
 * Altered optimum range, either increased or decreased depending on the combination
 * Creation of new noises or the enhancement of known undesirable ones (such as mild "popping", "crunching/plucking", "static"), resulting in a loss of quality compared to the original two vocals that were used for it or the production of undesirable results.
 * Tones that change unexpectedly as the vocal goes up and down the octaves (particularly of vocals with large vocal ranges and/or variation between the two vocals); this makes them unstable and even suffer from tonal collapse.
 * Causing vocals to produce a croaky result, a bug which does not impact all VOCALOIDs naturally (see BIG AL as an example of a VOCALOID with a known natural croakiness).

Why do Bugs appear?
XSY's differences are worked out by use of mathematical equations, to work out the differences between the primary and secondary vocals and alter the wavelength of the primary vocal in accordance. The calculations don't work particularly well when the voicebanks have libraries built entirely different to each other as it was not designed to handle this and it is can be impossible to get good quality results.

Furthermore, if a sound is missing in the secondary library then the XSY function cannot reference it for XSY use since it doesn't exist. This is why VOCALOID3 and VOCALOID4 XSY is not always feasible, nor XSY between languages and multiple VOCALOIDs. This is also why features such as E.V.E.C. may cause issues with XSY.

Below is a explanation for major known issues with XSY and why they occur.

Loss of Clarity;
For some vocals, they gain clarity because of a relied upon single or more trait such as a strong attack, powerful/clear tones or high quality recorded samples, traits which are common to "power" types. However, mixing these vocals with other very different traits can remove the very reason for their clarity.

"Soft" vocals have been known to impact clarity, due to their looser pronunciations, so mixing with this vocal can cause the sounds on the primary vocal to loosen and mimic the traits of the soft vocal. This can make the resulting primary vocal appear to mumble more as it weakens consonants which are often a clarity controller, allowing vowels to have more impact on the vocal.

This issue, appears when mixing different contrasting vocal types and is applicable to any vocaloid that can use XSY in any form as the already aforementioned Power/Soft XSY mix. It impacts the entire vocal XSY result. The issue is possible to fix both inside Vocaloid or in a sound editing software and is not too different to normal Vocaloid editing.

Exaggeration/noticeably of a flaw;
This can impact a small collection of sounds, tone or pitch.

One of the more common issues with XSY is how flaws are handled. If both vocals were drawn from the same set of (bad) data, or simply both ended up with a similar/same glitch (both in the case of vocals within a same release or different releases) then this trait is ignored by the XSY function. As a result, while other traits can be lessened as the vocal is adjusted to either vocal dominance the shared trait remains more untouched. Without having the other traits to hide it, the flaw becomes more noticeable even though nothing has changed it. This can be demonstrated in the Megurine Luka V4X English vocals "Straight" and "Soft", which share identical flaws as they draw their vocals from similar data.

On the other hand, if the flaw exists in the second vocal but is stronger a similar issue can occur. In this case, the other traits do not always has to be weakened in addition as the flaw itself is the impacted result. The second vocals impact makes the primary vocals flaw be pulled to match its strength resulting in the problem being made worst. This is examples in Arsloid "Bright" and "Soft", which often cause the original Arsloid vocal to have exaggerated flaws due to their own lack of content to soften the flaws during the XSY process.

Both are a result of matching similar vocal types, but one that have contrasting results with each other, such as those made for single-character XSY.

Both cases are fixable if the user is familiar to both vocals, the one creating the problem or XSY pairing. It is often a quick or minor fix and requires tweaking on certain areas of the VSQX file. This can also be fixed with the right filters in an external sound editing software.

Glitches where there were none;
Sometimes, it is a simple case of a glitch exists in the secondary vocal, which when used for XSY, causes that glitch to appear in the primary vocal. Incorrect phonemes, unlinked sounds, and random crunches all can be added to a vocal that didn't have them before. This is also notable to occur between a V3 and V4 vocal such as the versions for Gackpoid or Megpoid. While the V4 version may have fixed the flaw XSY with a V3 version may re-add it back in.

Other times due to the extreme results of the secondary vocal, the result is a Lower quality sound overall with new glitches added altogether that were in neither vocal. This mainly appears when using cross-character XSY as it is uncertain what impact a second vocal like Cul, Kokone or Gachapoid as these vocals tend to have great impact on other Vocals. The glitch is a flaw added by the XSY process itself.

These particular issues can be quite a problem for inexperienced producers and veterans alike. In both cases, it can go unnoticeable as it has already been noted some Producers fail to notice glitches in non-XSY results, especially if they are short or barely noticeable. It adds a poor quality sound to a result that otherwise would have been high quality. Often, since the Vocaloid software created the glitch, it must be edited within Vocaloid itself and cannot always be fixed by additional software.

Change of Range/Tempo
Each vocal has its own recommended range and tempo of which clearly marks where the Vocaloid results are best for each vocal. This is applicable even when the studio doesn't make the ranges known.

The two recommended ranges (vocal and temp) are determined by the layers of pitch per vocaloid both stationary and articulation based as a result of the combined samples of a voicebank library. XSY cross references both voicebanks layers of pitch sound to calculate a new result. A typical Vocaloid vocal in Japanese and English has 2-3 layers and the combined XSY result is then based on a 4-6 layer vocal instead. This can end up in both a good and bad impact on the sound when completed. The new voice has a new tempo and vocal range as a result that is different to the original two vocals ranges.

There are two things to note in regards to this;
 * 1) The new vocal result is strongest when the layers line up with each other overall with similar range, creating a solid foundation of which all layers support each other equally. This produces a HQ result, however, the downside is if the vocals are too close you may not see much of a difference
 * 2) A vocal which focuses souly on a single vocal range (example; lower octaves), can benefit when combined with a vocal that focuses souly on higher ranges or vice versa for the other way around. The new secondarily vocals addition layers offer addition extensive support, allowing the primarily vocal to go past its previous recommended range. However, this more likely is to cause previous mentioned glitches appearing due to the use of two very different vocals, causing a Vocaloid to produce LQ results.

An example of this would apply to VY1v4; "Power", "Soft" and "Normal" all can have this effect on each other, while "Natural" will have this effect on them. On the other hand, if you use "Natural" as the primarily vocal, then the other 3 vocals become support for it. They line up with all aspects of "Natural" and will strength and adjust certain aspects of the vocal, for example "Power" will strength its upper ranges while "soft" loosens them.

One last thing to note about the change of vocal range is that when two vocals that have no overlap at all are XSY, the middle unsupported range becomes unpredictable. Though rare to create as most Vocaloids in each XSY group have at least a few keys overlapping each other, Vocaloid none the less is completely improvising the results in those rare occasions it occurs. The larger the gap between two vocal ranges in the middle, the more unpredictable this range is. The centre of this unstable vocal range can become a serious weak point producing LQ results in large gaps. Further more, it can have extreme results on the two ranges covered by either voicebanks since the calculations between the two are greatly exaggerated. In such rare cases this does occur, it is recommended that users be more subtle with how much the secondary vocal influences the primary vocal to limit the issue.

Software Agreements
Due to the licencing agreements between VOCALOIDs and studios, VOCALOIDs generally are not open for XSY between other VOCALOIDs or VOCALOIDs of other studios. However, XSY mods do exist which alter the software to allow forbidden XSY combinations. The Vocaloid licensing agreements do not cover these cases and users should be aware of this when installing any XSY mods.


 * See Controversy Concerns for details on XSY Mods.

Note; due to the fact this does not comply with the Yamaha licensing agreements, the Vocaloid wiki is unable to cover known cases of XSY modding and songs using them will be removed, as well as possible URLs to them, when found.