Sound, vocoders and human voice

Programming, for all ages and all languages.
Post Reply
ManOfSteel

Sound, vocoders and human voice

Post by ManOfSteel »

Hello,
I would like to know how vocoders and/or text readers (which use Microsoft's text-to-speech engine for example) work.
Where can I find some information about the human voice sound (pitches, frequencies, intensities, ADSR, ...), especially about every letter in the alphabet.
I posted this here (and not in the General off-topic) because the aim of it is programming and not just learning information.
Thank you in advance for your help.
Eero Ränik

Re:Sound, vocoders and human voice

Post by Eero Ränik »

Recorded diphones are used instead of letters in most text-to-speech applications. Diphones are segments containing combinations of any two sounds, not letters, to make transitions between the sounds seem more natural. Text gets interpreted, first, to make sure any sound is spelled right, as one letter could be read differently in different words. In Estonian, my native language, you'd also need to know if a sound is palatalized, for example.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Sound, vocoders and human voice

Post by Candy »

if it'd were my project I'd record each syllable-pronounciation, plus each syllable-to-syllable transition. Then chain them together and play.

For most languages I think that'd work, for some you'd need to rewrite the plain-text version to a syllable-version before playing as it might be a nontrivial mapping (thinking mainly of dutch now :S).
ManOfSteel

Re:Sound, vocoders and human voice

Post by ManOfSteel »

Yes, but by using this method, you need recorded sounds of human voices and that can take a lot of place on the disk.
Maybe I wasn't clear enough when I said letters, I rather meant "formants". I'm not sure if that word exists in english but what I mean by that is the frequency range of every diphones, three to six of these "formants" being enough for every diphones.
So what I need is how to synthesise voice. I know this method doesn't make very natural voices but it's better than having to deal with megabytes of recorded sounds using very complicated algorithms.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Sound, vocoders and human voice

Post by Pype.Clicker »

i used to have a 4KB text-to-speech demo on my disk (not my own, though). Once@home, i'll try to locate it ...
Post Reply