VO-BB - 19 YEARS OLD!

Mike Harrison · Posted: Sun Nov 29, 2015 7:54 pm Post subject: Text-to-Speech

It's been a while since this topic has been discussed or even mentioned. Has anyone yet heard any text-to-speech for instruction that you considered on-par with human speech? Just as compelling?

In this forum on the website for Articulate (one of the top eLearning authoring applications), someone asked three years ago for recommendations for TTS (text-to-speech) software. When I found the discussion a few months ago, I gave my thoughts on the use of TTS in learning. It's the last comment on that page.

The discussion continues on the following page and, one of those responding is the author of a TTS application, who feels what he has developed has every bit the impact of a live presentation.

I invite you to read the discussion and especially follow the TTS author's link to his website where a video demonstrates his claim that TTS is, essentially, just as compelling as human speech.

I'm very interested to read what you think. Many thanks.
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

Gregory Best · Posted: Sun Nov 29, 2015 9:52 pm Post subject:

Very well said Mike. It is the digital age. Too many are willing to accept lower quality. Look what has happened to professional photography.
_________________
Gregory Best

greg@gregorybest.com

Mandy Nelson · Posted: Mon Nov 30, 2015 8:00 pm Post subject:

Travis · Posted: Mon Dec 07, 2015 2:25 pm Post subject:

Hi everyone (I'm BACK, after a very long absence!)

Here's the deal on TTS (Text-to-speech.)

Some of the text-to-speech systems sound very "real". However, NONE of the systems have the ability to UNDERSTAND what they are talking about. It's going to be a long time before they have that ability.
_________________
Travis
www.VOTalent.com

Mike Harrison · Posted: Mon Dec 07, 2015 4:45 pm Post subject:

The point I am trying to make with the author of the TTS software is – in human speech and TTS – that the sound of the "voice" is not the same thing and is secondary to whether the text is interpreted properly so that proper inflection is placed on the correct syllables and words. In many cases, improperly placed inflection can change the meaning of a sentence. And, in instruction (eLearning), this is crucial.

Here's what I mean:

‘I never said she ate your sandwich.’ (Somebody else said it)
‘I never said she ate your sandwich.’ (I definitely did not say anything)
‘I never said she ate your sandwich.’ (I implied it)
‘I never said she ate your sandwich.’ (I said someone else did)
‘I never said she ate your sandwich.’ (I said she did something else with the sandwich)
‘I never said she ate your sandwich.’ (I said she ate someone else’s sandwich)
‘I never said she ate your sandwich.’ (I said she ate something else)

The TTS software author feels his "voices" are every bit as compelling as a human motivational speaker. A video on his website which features Bill Gates giving a commencement address is used as an example. It is my opinion that Bill Gates is not the best example of a compelling speaker and, as such, the software author doesn't really know what a truly compelling speaker sounds like and is capable of. Following the brief Gates sample, the TTS software "speaks" the same words... and in several instances places the inflection on the wrong words... the exact reason TTS does not (yet) measure up.
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

Bruce · Posted: Mon Dec 07, 2015 6:01 pm Post subject:

Funny you should post the sandwich example. I've had this one for some time now and it works really well with Yiddish inflections:

Mike Harrison · Posted: Mon Dec 07, 2015 8:38 pm Post subject:

Gregory Best · Posted: Tue Dec 08, 2015 12:38 am Post subject:

As I told Mike privately, my wife and I couldn't make it all the way through the video. It sucks. I hope people aren't really buying it.
_________________
Gregory Best

greg@gregorybest.com

heyguido · Posted: Tue Dec 08, 2015 6:58 am Post subject:

Just a thought, Mike...

Why are you wasting time on this?
(Apply appropriate emphasis where necessary) Wink

_________________
Don Brookshire
"Wait.... They wanna PAY me for this?"

Mike Harrison · Posted: Tue Dec 08, 2015 7:24 am Post subject:

I'm concerned about this for two reasons. And I should've explained this when beginning the thread.

First, should TTS become attractive only from the money-saving perspective (and we know how important a factor that is), it will mean less available work for those in our field. Second, yet equally important to me, we are seeing everywhere, every day the result of less-than good education. Where what is, to me, an amazing number of people, adults included, cannot spell the most common, simple words, or don't know which spelling of a word to use in a given context. Or, like the adult who called herself an author I recently encountered on social media, who not only used the word "untitled" instead of "entitled, but who didn't know that water was essential for life.

Companies are trying to get better performance from their employees by administering eLearning. But the money, time and all other effort that goes into the production of the courses – not to mention the time spent by those taking the courses – will be wasted because the "speech" of TTS technology is so mechanical and lifeless, it is anything but engaging. When we can all remember boring school teachers and our level of engagement and performance under them, we can't expect that people being forced to listen to lessons of 10 minutes or more of completely non-compelling droning is going to generate the results being hoped for.

I don't consider wanting better outcomes to be a waste of time at all.
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

todd ellis · Posted: Tue Dec 08, 2015 7:30 am Post subject:

if i had to listen to more than 60 seconds of this as employee training, i'd quit.
_________________
"i know philip banks": todd ellis
who's/on/1st?

Mandy Nelson · Posted: Tue Dec 08, 2015 8:08 am Post subject:

In my opinion, and it's just one of many, this is not worth worrying about. The reason you are seeing more examples of poor education, or perhaps laziness, is because it is in front of our eyes daily as we are glued to our screens. It makes it very easy to have little tolerance. Walk into your nearest small town and talk to people. That guy who barely graduated high school and always says "could care less" is the one who is going to help you keep your car on the road for a couple more years because he knows the reality of not being able to buy a new car.

Which then leads into the money saving perspective. That, too, is in every aspect of our lives. There is a dealership a town over that sells the hottest new cars. Many people who live in my town can afford to get a new one every couple of years. Me, I go to the nice guy up the road with the magic car Band-aids. There will always be people who can afford to hire us. Always. I could bring up examples but I fear I'm rambling. TTS isn't going anywhere but neither is the desire to hear a real voice.
_________________
006 member of the Sisterhood of the Traveling Mic. Bonded by sound.

Manfillappsoc: The Mandy and Philip mutual appreciation Society. Who's in your network?

Have you seen my mic closet? ~ me to my future husband

Bish · Posted: Tue Dec 08, 2015 9:20 am Post subject:

I couldn't resist. I actually commented on the video. I did make it all the way through and was extremely amused by how excited they were about their "rhetorical pause" ... when the program couldn't even enunciate the word "speaking" properly. Irony anyone?

If anyone thinks that this is a reasonable replacement for a real voice, then you are sadly deluded. What you may save in not hiring a narrator you will lose ten-fold in lost revenue and credibility. It is an interesting technical exercise, but that's about it. Consider listening to an on-line training course (for an hour or so) with the voice from the video here. You will, a) not remember the course material or content because your focus will have been pulled away by the synthetic and inaccurate narration, or b) probably not listen to more than a few minutes because of the physical discomfort caused. It actually reminds me of an old story... A man spent years training his dog to walk on its hind legs. When he showed the trick to a friend, the observation was, "Yes, very impressive... but tell me, why? It will only ever be a curiosity as a dog, and a pale and pointless imitation of a man."
_________________
Bish a.k.a. Bish
Smoke me a kipper... I'll be back for breakfast.
I will not feed the trolls... I will not feed the trolls... I will not feed the trolls... I will not feed the trolls.

chrisvoco · Club 300 Joined: 14 Mar 2014 Posts: 380 Location: Local

eLearning is just one of a large handful of industries at one stage or another, for good or ill, of TTS integration. Certainly it is inappropriate - downright dumb, most of the time - to use synthesized speech except as an absolute last resort or where the quality of the speech absolutely does not matter and the critical thing is that the information gets conveyed. Emergency alerting is a good example of appropriate application.

Bruce mentioned pre-programming of the dialogue. There are so many subtleties here, aside from merely correct pronunciation and inflection, localization beyond the particular language (do you say "creek" or "crick"?), for example. One of the impediments is that there's precious little true standardization between TTS systems. Though this may change in the future - who knows! - you really only get very good results when you're dealing with an extremely limited lexicon and can focus on making that lexicon sound real when synthesized. It suffers not so much from an engine deficiency as an interface deficiency. Take a clue from music synthesizers: the key to their becoming mainstream instruments wasn't so much the evolution of the actual synthesis technology as it was the interface, in this case the good ol' organ keyboard that has a few centuries of study and understanding behind it and enables a skilled user to produce pleasing music even if the actual synthesized sounds ain't all that good or even realistic. The keyboard is a well- and widely-understood standard - and it was the right standard. Written text is also well-understood, but there's no one-to-one correspondence between what you type in and what you'll necessarily hear as the result - so, it's not the right standard, and whatever the right standard is, it hasn't yet been discovered.

For fun, look up "Perfect Paul" or DECTalk.

And does anyone remember the talking "Eliza" psychologist program, or S.A.M. on the Commodore 64? Smile

_________________
Finally, Ford stops starting to say things and starts.

chrisvoco · Club 300 Joined: 14 Mar 2014 Posts: 380 Location: Local

With essentially state-of-the-art technology from 1982, a solo TTS vocalist:

http://simulationcorner.net/SAM/sing.wav
_________________
Finally, Ford stops starting to say things and starts.