View previous topic :: View next topic |
Author |
Message |
Mike Harrison M&M
Joined: 03 Nov 2007 Posts: 2029 Location: Equidistant from New York City and Philadelphia, along the NJ Shore
|
Posted: Sun Nov 29, 2015 7:54 pm Post subject: Text-to-Speech |
|
|
It's been a while since this topic has been discussed or even mentioned. Has anyone yet heard any text-to-speech for instruction that you considered on-par with human speech? Just as compelling?
In this forum on the website for Articulate (one of the top eLearning authoring applications), someone asked three years ago for recommendations for TTS (text-to-speech) software. When I found the discussion a few months ago, I gave my thoughts on the use of TTS in learning. It's the last comment on that page.
The discussion continues on the following page and, one of those responding is the author of a TTS application, who feels what he has developed has every bit the impact of a live presentation.
I invite you to read the discussion and especially follow the TTS author's link to his website where a video demonstrates his claim that TTS is, essentially, just as compelling as human speech.
I'm very interested to read what you think. Many thanks. _________________ Mike
Male Voice Over Talent
I have taken leave of my sensors.
|
|
Back to top |
|
|
Gregory Best The Gates of Troy
Joined: 04 Aug 2005 Posts: 1853 Location: San Diego area (east of Connie and south and east of Bailey)
|
Posted: Sun Nov 29, 2015 9:52 pm Post subject: |
|
|
Very well said Mike. It is the digital age. Too many are willing to accept lower quality. Look what has happened to professional photography. _________________ Gregory Best
greg@gregorybest.com |
|
Back to top |
|
|
Mandy Nelson MMD
Joined: 07 Aug 2008 Posts: 2899 Location: Wicked Mainah
|
Posted: Mon Nov 30, 2015 8:00 pm Post subject: |
|
|
Gregory Best wrote: | Look what has happened to professional photography. |
That.
I've been working on a couple of programs that, in testing, are combining real voice with TTS. Any and all developers I've spoken with want full on human but money doesn't allow that, especially at testing. If we raise our kids right they won't stand for it, either. _________________ 006 member of the Sisterhood of the Traveling Mic. Bonded by sound.
Manfillappsoc: The Mandy and Philip mutual appreciation Society. Who's in your network?
Have you seen my mic closet? ~ me to my future husband |
|
Back to top |
|
|
Travis Contributor IV
Joined: 09 Feb 2006 Posts: 149 Location: Los Angeles, CA
|
Posted: Mon Dec 07, 2015 2:25 pm Post subject: |
|
|
Hi everyone (I'm BACK, after a very long absence!)
Here's the deal on TTS (Text-to-speech.)
Some of the text-to-speech systems sound very "real". However, NONE of the systems have the ability to UNDERSTAND what they are talking about. It's going to be a long time before they have that ability. _________________ Travis
www.VOTalent.com |
|
Back to top |
|
|
Mike Harrison M&M
Joined: 03 Nov 2007 Posts: 2029 Location: Equidistant from New York City and Philadelphia, along the NJ Shore
|
Posted: Mon Dec 07, 2015 4:45 pm Post subject: |
|
|
The point I am trying to make with the author of the TTS software is in human speech and TTS that the sound of the "voice" is not the same thing and is secondary to whether the text is interpreted properly so that proper inflection is placed on the correct syllables and words. In many cases, improperly placed inflection can change the meaning of a sentence. And, in instruction (eLearning), this is crucial.
Here's what I mean:
I never said she ate your sandwich. (Somebody else said it)
I never said she ate your sandwich. (I definitely did not say anything)
I never said she ate your sandwich. (I implied it)
I never said she ate your sandwich. (I said someone else did)
I never said she ate your sandwich. (I said she did something else with the sandwich)
I never said she ate your sandwich. (I said she ate someone elses sandwich)
I never said she ate your sandwich. (I said she ate something else)
The TTS software author feels his "voices" are every bit as compelling as a human motivational speaker. A video on his website which features Bill Gates giving a commencement address is used as an example. It is my opinion that Bill Gates is not the best example of a compelling speaker and, as such, the software author doesn't really know what a truly compelling speaker sounds like and is capable of. Following the brief Gates sample, the TTS software "speaks" the same words... and in several instances places the inflection on the wrong words... the exact reason TTS does not (yet) measure up. _________________ Mike
Male Voice Over Talent
I have taken leave of my sensors.
|
|
Back to top |
|
|
Bruce Boardmeister
Joined: 06 Jun 2005 Posts: 7928 Location: Portland, OR
|
Posted: Mon Dec 07, 2015 6:01 pm Post subject: |
|
|
Funny you should post the sandwich example. I've had this one for some time now and it works really well with Yiddish inflections:
Quote: | This from Leo Rosten's book "The Joys of Yiddish": (The questioner as asking whether he/she should attend a concert being given by a niece. The meaning of the same sentence changes completely, depending on where the speaker places the emphasis:)
I should buy two tickets for her concert?--meaning:, "After what she did to me?"
I should buy two tickets for her concert?--meaning: "What, you're giving me a lesson in ethics?"
I should buy two tickets for her concert?--meaning: I wouldn't go even if she were giving out free passes!
I should buy two tickets for her concert?--meaning: I'm having enough trouble deciding whether it's worth one.
I should buy two tickets for her concert?--She should be giving out free passes, or the hall will be empty.
I should buy two tickets for her concert?--Did she buy tickets to our daughter's recital?
I should buy two tickets for her concert?--You mean, they call what she does a "concert"?
|
It's going to be quite a while before a computer can figure out the subtlety of these without lots of pre-programming of the dialogue, therefore defeating the time-saving issue for TTS.
B _________________ VO-BB Member #31 Enlisted June, 2005
I'm not a Zoo, but over the years I've played one on radio/TV. . |
|
Back to top |
|
|
Mike Harrison M&M
Joined: 03 Nov 2007 Posts: 2029 Location: Equidistant from New York City and Philadelphia, along the NJ Shore
|
Posted: Mon Dec 07, 2015 8:38 pm Post subject: |
|
|
Bruce wrote: | It's going to be quite a while before a computer can figure out the subtlety of these without lots of pre-programming of the dialogue, therefore defeating the time-saving issue for TTS. |
EXACTLY what I've been trying to convey to the TTS software author. The amount of time and money required to make TTS sound truly coherent will be at least as costly (but probably far greater) as hiring a pro to do it correctly... naturally... and in far less time.
PLEASE... take six minutes to listen to this video and tell me: if you were not a voice-over talent but instead a new employee who had to go through some eLearning, would you find this engaging? Do you think, after listening to that kind of speech for, say, 20 minutes or more, you would remember what you heard? _________________ Mike
Male Voice Over Talent
I have taken leave of my sensors.
|
|
Back to top |
|
|
Gregory Best The Gates of Troy
Joined: 04 Aug 2005 Posts: 1853 Location: San Diego area (east of Connie and south and east of Bailey)
|
Posted: Tue Dec 08, 2015 12:38 am Post subject: |
|
|
As I told Mike privately, my wife and I couldn't make it all the way through the video. It sucks. I hope people aren't really buying it. _________________ Gregory Best
greg@gregorybest.com |
|
Back to top |
|
|
heyguido MMD
Joined: 31 Aug 2011 Posts: 2507 Location: RDU, the Geek Capitol of the South
|
Posted: Tue Dec 08, 2015 6:58 am Post subject: |
|
|
Just a thought, Mike...
Why are you wasting time on this?
(Apply appropriate emphasis where necessary) _________________ Don Brookshire
"Wait.... They wanna PAY me for this?" |
|
Back to top |
|
|
Mike Harrison M&M
Joined: 03 Nov 2007 Posts: 2029 Location: Equidistant from New York City and Philadelphia, along the NJ Shore
|
Posted: Tue Dec 08, 2015 7:24 am Post subject: |
|
|
I'm concerned about this for two reasons. And I should've explained this when beginning the thread.
First, should TTS become attractive only from the money-saving perspective (and we know how important a factor that is), it will mean less available work for those in our field. Second, yet equally important to me, we are seeing everywhere, every day the result of less-than good education. Where what is, to me, an amazing number of people, adults included, cannot spell the most common, simple words, or don't know which spelling of a word to use in a given context. Or, like the adult who called herself an author I recently encountered on social media, who not only used the word "untitled" instead of "entitled, but who didn't know that water was essential for life.
Companies are trying to get better performance from their employees by administering eLearning. But the money, time and all other effort that goes into the production of the courses not to mention the time spent by those taking the courses will be wasted because the "speech" of TTS technology is so mechanical and lifeless, it is anything but engaging. When we can all remember boring school teachers and our level of engagement and performance under them, we can't expect that people being forced to listen to lessons of 10 minutes or more of completely non-compelling droning is going to generate the results being hoped for.
I don't consider wanting better outcomes to be a waste of time at all. _________________ Mike
Male Voice Over Talent
I have taken leave of my sensors.
|
|
Back to top |
|
|
todd ellis A Zillion
Joined: 02 Jan 2007 Posts: 10494 Location: little egypt
|
Posted: Tue Dec 08, 2015 7:30 am Post subject: |
|
|
if i had to listen to more than 60 seconds of this as employee training, i'd quit. _________________ "i know philip banks": todd ellis
who's/on/1st?
|
|
Back to top |
|
|
Mandy Nelson MMD
Joined: 07 Aug 2008 Posts: 2899 Location: Wicked Mainah
|
Posted: Tue Dec 08, 2015 8:08 am Post subject: |
|
|
In my opinion, and it's just one of many, this is not worth worrying about. The reason you are seeing more examples of poor education, or perhaps laziness, is because it is in front of our eyes daily as we are glued to our screens. It makes it very easy to have little tolerance. Walk into your nearest small town and talk to people. That guy who barely graduated high school and always says "could care less" is the one who is going to help you keep your car on the road for a couple more years because he knows the reality of not being able to buy a new car.
Which then leads into the money saving perspective. That, too, is in every aspect of our lives. There is a dealership a town over that sells the hottest new cars. Many people who live in my town can afford to get a new one every couple of years. Me, I go to the nice guy up the road with the magic car Band-aids. There will always be people who can afford to hire us. Always. I could bring up examples but I fear I'm rambling. TTS isn't going anywhere but neither is the desire to hear a real voice. _________________ 006 member of the Sisterhood of the Traveling Mic. Bonded by sound.
Manfillappsoc: The Mandy and Philip mutual appreciation Society. Who's in your network?
Have you seen my mic closet? ~ me to my future husband |
|
Back to top |
|
|
Bish 3.5 kHz
Joined: 22 Nov 2009 Posts: 3738 Location: Lost in the cultural wasteland of Long Island
|
Posted: Tue Dec 08, 2015 9:20 am Post subject: |
|
|
I couldn't resist. I actually commented on the video. I did make it all the way through and was extremely amused by how excited they were about their "rhetorical pause" ... when the program couldn't even enunciate the word "speaking" properly. Irony anyone?
If anyone thinks that this is a reasonable replacement for a real voice, then you are sadly deluded. What you may save in not hiring a narrator you will lose ten-fold in lost revenue and credibility. It is an interesting technical exercise, but that's about it. Consider listening to an on-line training course (for an hour or so) with the voice from the video here. You will, a) not remember the course material or content because your focus will have been pulled away by the synthetic and inaccurate narration, or b) probably not listen to more than a few minutes because of the physical discomfort caused. It actually reminds me of an old story... A man spent years training his dog to walk on its hind legs. When he showed the trick to a friend, the observation was, "Yes, very impressive... but tell me, why? It will only ever be a curiosity as a dog, and a pale and pointless imitation of a man." _________________ Bish a.k.a. Bish
Smoke me a kipper... I'll be back for breakfast.
I will not feed the trolls... I will not feed the trolls... I will not feed the trolls... I will not feed the trolls. |
|
Back to top |
|
|
chrisvoco Club 300
Joined: 14 Mar 2014 Posts: 380 Location: Local
|
Posted: Tue Dec 08, 2015 9:24 am Post subject: |
|
|
eLearning is just one of a large handful of industries at one stage or another, for good or ill, of TTS integration. Certainly it is inappropriate - downright dumb, most of the time - to use synthesized speech except as an absolute last resort or where the quality of the speech absolutely does not matter and the critical thing is that the information gets conveyed. Emergency alerting is a good example of appropriate application.
Bruce mentioned pre-programming of the dialogue. There are so many subtleties here, aside from merely correct pronunciation and inflection, localization beyond the particular language (do you say "creek" or "crick"?), for example. One of the impediments is that there's precious little true standardization between TTS systems. Though this may change in the future - who knows! - you really only get very good results when you're dealing with an extremely limited lexicon and can focus on making that lexicon sound real when synthesized. It suffers not so much from an engine deficiency as an interface deficiency. Take a clue from music synthesizers: the key to their becoming mainstream instruments wasn't so much the evolution of the actual synthesis technology as it was the interface, in this case the good ol' organ keyboard that has a few centuries of study and understanding behind it and enables a skilled user to produce pleasing music even if the actual synthesized sounds ain't all that good or even realistic. The keyboard is a well- and widely-understood standard - and it was the right standard. Written text is also well-understood, but there's no one-to-one correspondence between what you type in and what you'll necessarily hear as the result - so, it's not the right standard, and whatever the right standard is, it hasn't yet been discovered.
For fun, look up "Perfect Paul" or DECTalk.
And does anyone remember the talking "Eliza" psychologist program, or S.A.M. on the Commodore 64? _________________ Finally, Ford stops starting to say things and starts. |
|
Back to top |
|
|
chrisvoco Club 300
Joined: 14 Mar 2014 Posts: 380 Location: Local
|
Posted: Tue Dec 08, 2015 9:58 am Post subject: |
|
|
With essentially state-of-the-art technology from 1982, a solo TTS vocalist:
http://simulationcorner.net/SAM/sing.wav _________________ Finally, Ford stops starting to say things and starts. |
|
Back to top |
|
|
|