VO-BB - 19 YEARS OLD! Forum Index VO-BB - 19 YEARS OLD!
Where A.I. is a four-letter word.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Text-to-Speech
Goto page 1, 2  Next
 
Post new topic   Reply to topic    VO-BB - 19 YEARS OLD! Forum Index -> Chat
View previous topic :: View next topic  
Author Message
Mike Harrison
M&M


Joined: 03 Nov 2007
Posts: 2029
Location: Equidistant from New York City and Philadelphia, along the NJ Shore

PostPosted: Sun Nov 29, 2015 7:54 pm    Post subject: Text-to-Speech Reply with quote

It's been a while since this topic has been discussed or even mentioned. Has anyone yet heard any text-to-speech for instruction that you considered on-par with human speech? Just as compelling?

In this forum on the website for Articulate (one of the top eLearning authoring applications), someone asked three years ago for recommendations for TTS (text-to-speech) software. When I found the discussion a few months ago, I gave my thoughts on the use of TTS in learning. It's the last comment on that page.

The discussion continues on the following page and, one of those responding is the author of a TTS application, who feels what he has developed has every bit the impact of a live presentation.

I invite you to read the discussion and especially follow the TTS author's link to his website where a video demonstrates his claim that TTS is, essentially, just as compelling as human speech.

I'm very interested to read what you think. Many thanks.
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

Back to top
View user's profile Send private message Send e-mail Visit poster's website
Gregory Best
The Gates of Troy


Joined: 04 Aug 2005
Posts: 1853
Location: San Diego area (east of Connie and south and east of Bailey)

PostPosted: Sun Nov 29, 2015 9:52 pm    Post subject: Reply with quote

Very well said Mike. It is the digital age. Too many are willing to accept lower quality. Look what has happened to professional photography.
_________________
Gregory Best

greg@gregorybest.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mandy Nelson
MMD


Joined: 07 Aug 2008
Posts: 2899
Location: Wicked Mainah

PostPosted: Mon Nov 30, 2015 8:00 pm    Post subject: Reply with quote

Gregory Best wrote:
Look what has happened to professional photography.


That.

I've been working on a couple of programs that, in testing, are combining real voice with TTS. Any and all developers I've spoken with want full on human but money doesn't allow that, especially at testing. If we raise our kids right they won't stand for it, either. Wink
_________________
006 member of the Sisterhood of the Traveling Mic. Bonded by sound.

Manfillappsoc: The Mandy and Philip mutual appreciation Society. Who's in your network?

Have you seen my mic closet? ~ me to my future husband
Back to top
View user's profile Send private message Visit poster's website
Travis
Contributor IV


Joined: 09 Feb 2006
Posts: 149
Location: Los Angeles, CA

PostPosted: Mon Dec 07, 2015 2:25 pm    Post subject: Reply with quote

Hi everyone (I'm BACK, after a very long absence!)

Here's the deal on TTS (Text-to-speech.)

Some of the text-to-speech systems sound very "real". However, NONE of the systems have the ability to UNDERSTAND what they are talking about. It's going to be a long time before they have that ability.
_________________
Travis
www.VOTalent.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike Harrison
M&M


Joined: 03 Nov 2007
Posts: 2029
Location: Equidistant from New York City and Philadelphia, along the NJ Shore

PostPosted: Mon Dec 07, 2015 4:45 pm    Post subject: Reply with quote

The point I am trying to make with the author of the TTS software is – in human speech and TTS – that the sound of the "voice" is not the same thing and is secondary to whether the text is interpreted properly so that proper inflection is placed on the correct syllables and words. In many cases, improperly placed inflection can change the meaning of a sentence. And, in instruction (eLearning), this is crucial.

Here's what I mean:

‘I never said she ate your sandwich.’ (Somebody else said it)
‘I never said she ate your sandwich.’ (I definitely did not say anything)
‘I never said she ate your sandwich.’ (I implied it)
‘I never said she ate your sandwich.’ (I said someone else did)
‘I never said she ate your sandwich.’ (I said she did something else with the sandwich)
‘I never said she ate your sandwich.’ (I said she ate someone else’s sandwich)
‘I never said she ate your sandwich.’ (I said she ate something else)

The TTS software author feels his "voices" are every bit as compelling as a human motivational speaker. A video on his website which features Bill Gates giving a commencement address is used as an example. It is my opinion that Bill Gates is not the best example of a compelling speaker and, as such, the software author doesn't really know what a truly compelling speaker sounds like and is capable of. Following the brief Gates sample, the TTS software "speaks" the same words... and in several instances places the inflection on the wrong words... the exact reason TTS does not (yet) measure up.
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

Back to top
View user's profile Send private message Send e-mail Visit poster's website
Bruce
Boardmeister


Joined: 06 Jun 2005
Posts: 7928
Location: Portland, OR

PostPosted: Mon Dec 07, 2015 6:01 pm    Post subject: Reply with quote

Funny you should post the sandwich example. I've had this one for some time now and it works really well with Yiddish inflections:

Quote:
This from Leo Rosten's book "The Joys of Yiddish": (The questioner as asking whether he/she should attend a concert being given by a niece. The meaning of the same sentence changes completely, depending on where the speaker places the emphasis:)

I should buy two tickets for her concert?--meaning:, "After what she did to me?"
I should buy two tickets for her concert?--meaning: "What, you're giving me a lesson in ethics?"
I should buy two tickets for her concert?--meaning: I wouldn't go even if she were giving out free passes!
I should buy two tickets for her concert?--meaning: I'm having enough trouble deciding whether it's worth one.
I should buy two tickets for her concert?--She should be giving out free passes, or the hall will be empty.
I should buy two tickets for her concert?--Did she buy tickets to our daughter's recital?
I should buy two tickets for her concert?--You mean, they call what she does a "concert"?


It's going to be quite a while before a computer can figure out the subtlety of these without lots of pre-programming of the dialogue, therefore defeating the time-saving issue for TTS.

B
_________________
VO-BB Member #31 Enlisted June, 2005

I'm not a Zoo, but over the years I've played one on radio/TV. .
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike Harrison
M&M


Joined: 03 Nov 2007
Posts: 2029
Location: Equidistant from New York City and Philadelphia, along the NJ Shore

PostPosted: Mon Dec 07, 2015 8:38 pm    Post subject: Reply with quote

Bruce wrote:
It's going to be quite a while before a computer can figure out the subtlety of these without lots of pre-programming of the dialogue, therefore defeating the time-saving issue for TTS.

EXACTLY what I've been trying to convey to the TTS software author. The amount of time and money required to make TTS sound truly coherent will be at least as costly (but probably far greater) as hiring a pro to do it correctly... naturally... and in far less time.

PLEASE... take six minutes to listen to this video and tell me: if you were not a voice-over talent but instead a new employee who had to go through some eLearning, would you find this engaging? Do you think, after listening to that kind of speech for, say, 20 minutes or more, you would remember what you heard?
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

Back to top
View user's profile Send private message Send e-mail Visit poster's website
Gregory Best
The Gates of Troy


Joined: 04 Aug 2005
Posts: 1853
Location: San Diego area (east of Connie and south and east of Bailey)

PostPosted: Tue Dec 08, 2015 12:38 am    Post subject: Reply with quote

As I told Mike privately, my wife and I couldn't make it all the way through the video. It sucks. I hope people aren't really buying it.
_________________
Gregory Best

greg@gregorybest.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
heyguido
MMD


Joined: 31 Aug 2011
Posts: 2507
Location: RDU, the Geek Capitol of the South

PostPosted: Tue Dec 08, 2015 6:58 am    Post subject: Reply with quote

Just a thought, Mike...

Why are you wasting time on this?
(Apply appropriate emphasis where necessary) Wink
_________________
Don Brookshire
"Wait.... They wanna PAY me for this?"
Back to top
View user's profile Send private message Send e-mail
Mike Harrison
M&M


Joined: 03 Nov 2007
Posts: 2029
Location: Equidistant from New York City and Philadelphia, along the NJ Shore

PostPosted: Tue Dec 08, 2015 7:24 am    Post subject: Reply with quote

I'm concerned about this for two reasons. And I should've explained this when beginning the thread.

First, should TTS become attractive only from the money-saving perspective (and we know how important a factor that is), it will mean less available work for those in our field. Second, yet equally important to me, we are seeing everywhere, every day the result of less-than good education. Where what is, to me, an amazing number of people, adults included, cannot spell the most common, simple words, or don't know which spelling of a word to use in a given context. Or, like the adult who called herself an author I recently encountered on social media, who not only used the word "untitled" instead of "entitled, but who didn't know that water was essential for life.

Companies are trying to get better performance from their employees by administering eLearning. But the money, time and all other effort that goes into the production of the courses – not to mention the time spent by those taking the courses – will be wasted because the "speech" of TTS technology is so mechanical and lifeless, it is anything but engaging. When we can all remember boring school teachers and our level of engagement and performance under them, we can't expect that people being forced to listen to lessons of 10 minutes or more of completely non-compelling droning is going to generate the results being hoped for.

I don't consider wanting better outcomes to be a waste of time at all.
_________________
Mike
Male Voice Over Talent
I have taken leave of my sensors.

Back to top
View user's profile Send private message Send e-mail Visit poster's website
todd ellis
A Zillion


Joined: 02 Jan 2007
Posts: 10494
Location: little egypt

PostPosted: Tue Dec 08, 2015 7:30 am    Post subject: Reply with quote

if i had to listen to more than 60 seconds of this as employee training, i'd quit.
_________________
"i know philip banks": todd ellis
who's/on/1st?

Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mandy Nelson
MMD


Joined: 07 Aug 2008
Posts: 2899
Location: Wicked Mainah

PostPosted: Tue Dec 08, 2015 8:08 am    Post subject: Reply with quote

In my opinion, and it's just one of many, this is not worth worrying about. The reason you are seeing more examples of poor education, or perhaps laziness, is because it is in front of our eyes daily as we are glued to our screens. It makes it very easy to have little tolerance. Walk into your nearest small town and talk to people. That guy who barely graduated high school and always says "could care less" is the one who is going to help you keep your car on the road for a couple more years because he knows the reality of not being able to buy a new car.

Which then leads into the money saving perspective. That, too, is in every aspect of our lives. There is a dealership a town over that sells the hottest new cars. Many people who live in my town can afford to get a new one every couple of years. Me, I go to the nice guy up the road with the magic car Band-aids. There will always be people who can afford to hire us. Always. I could bring up examples but I fear I'm rambling. TTS isn't going anywhere but neither is the desire to hear a real voice.
_________________
006 member of the Sisterhood of the Traveling Mic. Bonded by sound.

Manfillappsoc: The Mandy and Philip mutual appreciation Society. Who's in your network?

Have you seen my mic closet? ~ me to my future husband
Back to top
View user's profile Send private message Visit poster's website
Bish
3.5 kHz


Joined: 22 Nov 2009
Posts: 3738
Location: Lost in the cultural wasteland of Long Island

PostPosted: Tue Dec 08, 2015 9:20 am    Post subject: Reply with quote

I couldn't resist. I actually commented on the video. I did make it all the way through and was extremely amused by how excited they were about their "rhetorical pause" ... when the program couldn't even enunciate the word "speaking" properly. Irony anyone?

If anyone thinks that this is a reasonable replacement for a real voice, then you are sadly deluded. What you may save in not hiring a narrator you will lose ten-fold in lost revenue and credibility. It is an interesting technical exercise, but that's about it. Consider listening to an on-line training course (for an hour or so) with the voice from the video here. You will, a) not remember the course material or content because your focus will have been pulled away by the synthetic and inaccurate narration, or b) probably not listen to more than a few minutes because of the physical discomfort caused. It actually reminds me of an old story... A man spent years training his dog to walk on its hind legs. When he showed the trick to a friend, the observation was, "Yes, very impressive... but tell me, why? It will only ever be a curiosity as a dog, and a pale and pointless imitation of a man."
_________________
Bish a.k.a. Bish
Smoke me a kipper... I'll be back for breakfast.
I will not feed the trolls... I will not feed the trolls... I will not feed the trolls... I will not feed the trolls.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
chrisvoco
Club 300


Joined: 14 Mar 2014
Posts: 380
Location: Local

PostPosted: Tue Dec 08, 2015 9:24 am    Post subject: Reply with quote

eLearning is just one of a large handful of industries at one stage or another, for good or ill, of TTS integration. Certainly it is inappropriate - downright dumb, most of the time - to use synthesized speech except as an absolute last resort or where the quality of the speech absolutely does not matter and the critical thing is that the information gets conveyed. Emergency alerting is a good example of appropriate application.

Bruce mentioned pre-programming of the dialogue. There are so many subtleties here, aside from merely correct pronunciation and inflection, localization beyond the particular language (do you say "creek" or "crick"?), for example. One of the impediments is that there's precious little true standardization between TTS systems. Though this may change in the future - who knows! - you really only get very good results when you're dealing with an extremely limited lexicon and can focus on making that lexicon sound real when synthesized. It suffers not so much from an engine deficiency as an interface deficiency. Take a clue from music synthesizers: the key to their becoming mainstream instruments wasn't so much the evolution of the actual synthesis technology as it was the interface, in this case the good ol' organ keyboard that has a few centuries of study and understanding behind it and enables a skilled user to produce pleasing music even if the actual synthesized sounds ain't all that good or even realistic. The keyboard is a well- and widely-understood standard - and it was the right standard. Written text is also well-understood, but there's no one-to-one correspondence between what you type in and what you'll necessarily hear as the result - so, it's not the right standard, and whatever the right standard is, it hasn't yet been discovered.

For fun, look up "Perfect Paul" or DECTalk.

And does anyone remember the talking "Eliza" psychologist program, or S.A.M. on the Commodore 64? Smile
_________________
Finally, Ford stops starting to say things and starts.
Back to top
View user's profile Send private message Send e-mail
chrisvoco
Club 300


Joined: 14 Mar 2014
Posts: 380
Location: Local

PostPosted: Tue Dec 08, 2015 9:58 am    Post subject: Reply with quote

With essentially state-of-the-art technology from 1982, a solo TTS vocalist:

http://simulationcorner.net/SAM/sing.wav
_________________
Finally, Ford stops starting to say things and starts.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    VO-BB - 19 YEARS OLD! Forum Index -> Chat All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group