Vocaloids issue is that it uses a multi layer system split between articulation and stationary pitch.
Here is an article explaining what articulation is
https://www.medicinenet.com/script/main/art.asp?articlekey=8746
Its basically the ability to "talk". Vocaloids have multiple layers of articulation to allow them to mimic the voice accurately. The theory is the more layers you have the more accurate the voice sounds like the singer would if they sung the no c#4 as "ma". So if you have a number of layers like this no matter which octave it is sung in it should sound the same because the overlap results in accurate "ma". It doesn't focus on being accurate for a note. However I will note for the most part, the loss of clarity can be felt here because as the voice most accurately mimics the provider, it puts "breath" before anything else. Its hard to explain.
I believe "stationary" is the bit covered by the music insert... But in short, this is the layer that deals with the stability of the voice for music itself. This allows the vocaloid to accurately mimic music basically. Long story short from what I gather, its basically the number of notes or the "scales" a Vocaloid is recorded in. Basically in the example of "ma", its "ma" as "A#3", "B#3", "C#3", etc. Most vocaloids are recorded in 1 stationary scale which is why their more or less monotone, least this was true for many V2 vocals.
You then have the calculations of the human voice done in maths by Vocaloid which merges to the samples to mimic the human vocal more. This is something else. It takes sample A and B and eases the transition from A to B, while making the vocal sound more human like.
I'm really breaking it down into simple terms because there is a lot more to this. Alter/ego voices I believe just had a single layer of articulation and then the engine completely takes over the rest of the work. Which is why its so rough sounding.
Lucy seems to still have the sample problem as Vocaloid... She basically is always "singing" even when she talks.
Its a little hard to hear her because of the use of vibrato in some of the samples to judge this engine as vibrato has a habit of making things sound better then they are. They are also using a fairly breathy singer, so can make use of blending techniques with the engine. If a singer like Cul was to be used for this engine, the results woul be different as Cul's voice is more solid. A lot of synths seem to go straight for the breathy vocals like this or opera because its often easy. Its also because breathy singers offer the most expression too...
One thing I did note is when Lucy goes low, this shows her voice up the most. Basically, compared to a vocaloid which uses maths to try and fill in the blanks outside of a vocaloids best range, you get a reasonable result that sounds good. Lucy sounds terrible outside of her good range. "Soho Teaser" shows this off. She sounds really derpy low.
Julian in my opinion isn't much better then Vocaloids, which makes me believe Lucy isn't much better then Vocaloid too. Their both really... Flat though... Even compared to Vocaloid, which is also monotone, this software has a lack of expression. I mean its not bad, but... The more you listen to it, the less human-like it sounds. Its got the clarity, but not Vocaloids realism when the engine behaves itself (and doesn't give off its "noise").
One way to really test it would be to see how it sounds going from a very low note to high, as Vocaloid itself struggles... Say from c#6 to c#1 (insert evil laugh). Using notes that are only a few keys away from each other doesn't show up the engines ability to smooth in notes on a extreme level. You always get warping. This may even be what the terrible sounds in "soho teaser" are. Lucy going through a gap in notes. A lot of synths don't seem to like to show off how good extreme note transactions are because its almost always a terrible result. Real singers do it with not a problem. Synths like to go from A#4 to b#4 to c#4 because its comfortable. :-?
Edit:
Tl;dr - I think this engine sounds really good in terms of language, but I'm questioning its capabilities beyond this. Something is wrong here, but you can only hear it in odd glimpses.