Sound, vocoders and human voice

ManOfSteel · Post by **ManOfSteel** » Fri Apr 15, 2005 12:56 am

Hello,
I would like to know how vocoders and/or text readers (which use Microsoft's text-to-speech engine for example) work.
Where can I find some information about the human voice sound (pitches, frequencies, intensities, ADSR, ...), especially about every letter in the alphabet.
I posted this here (and not in the General off-topic) because the aim of it is programming and not just learning information.
Thank you in advance for your help.

Eero Ränik · Post by **Eero Ränik** » Fri Apr 15, 2005 8:51 am

Recorded diphones are used instead of letters in most text-to-speech applications. Diphones are segments containing combinations of any two sounds, not letters, to make transitions between the sounds seem more natural. Text gets interpreted, first, to make sure any sound is spelled right, as one letter could be read differently in different words. In Estonian, my native language, you'd also need to know if a sound is palatalized, for example.

Candy · Post by **Candy** » Sun Apr 17, 2005 2:49 am

if it'd were my project I'd record each syllable-pronounciation, plus each syllable-to-syllable transition. Then chain them together and play.

For most languages I think that'd work, for some you'd need to rewrite the plain-text version to a syllable-version before playing as it might be a nontrivial mapping (thinking mainly of dutch now :S).

ManOfSteel · Post by **ManOfSteel** » Sun Apr 17, 2005 6:28 am

Yes, but by using this method, you need recorded sounds of human voices and that can take a lot of place on the disk.
Maybe I wasn't clear enough when I said letters, I rather meant "formants". I'm not sure if that word exists in english but what I mean by that is the frequency range of every diphones, three to six of these "formants" being enough for every diphones.
So what I need is how to synthesise voice. I know this method doesn't make very natural voices but it's better than having to deal with megabytes of recorded sounds using very complicated algorithms.

Pype.Clicker · Post by **Pype.Clicker** » Mon Apr 18, 2005 10:22 am

i used to have a 4KB text-to-speech demo on my disk (not my own, though). Once@home, i'll try to locate it ...

OSDev.org

Sound, vocoders and human voice

Sound, vocoders and human voice

Re:Sound, vocoders and human voice

Re:Sound, vocoders and human voice

Re:Sound, vocoders and human voice

Re:Sound, vocoders and human voice