6
\$\begingroup\$

When I was a kid, everything had a speech synthesizer in it, more or less. A couple years back I started to wonder where the technology is going after all these years, and after some research found that it's going nowhere. Storage has increased, making concatenative synthesis more life-like, but little else has improved.

Text to speech seems to be the primary research field. Most books I've found of the subject of speech synthesis skim on the actual voice generation and then spend hundreds of pages on text to speech.

I'm not interested in text to speech as such, but more on the voice generation. Yet, I haven't found a single book with good, practical explanation of this. Concatenative synthesis is simple to grasp, but formant is the one I'd like more information about. (The third method, physical modelling, would be a plus, but not all that interesting).

What makes this game specific is that I'd love to make a tool that lets low budget/downloadable games have speech, without having to go out to get actual voice actors and having to store hundreds of megs of oggs with the game. Since the author is in complete control of the voice editing before release, text to speech is less important; what's more important is the voice synthesis.

So, anyone know any good books about this?

\$\endgroup\$
2
  • \$\begingroup\$ When hunting for small, free (non-gpl) speech synths, I found an old branch of rsynth that seems to be totally free. It, however, contains some tables and I haven't been able to find information on where those tables came from. (The original author has since passed away, so no luck there). \$\endgroup\$ Commented Apr 24, 2014 at 10:51
  • \$\begingroup\$ I'm not sure this is particularly game-development related, your final paragraph notwithstanding. But more to the point, these sorts of list-of-resources questions have been considered off-topic in the intervening years since you asked this. \$\endgroup\$ Commented Apr 24, 2014 at 18:42

4 Answers 4

1
\$\begingroup\$

Have a look at HTS. This is a HMM-based Speech Synthesis System that uses hidden Markov models to learn and generate speech. This book has a chapter on HMM based synthesis as well as a complete description of other TTS technologies.

\$\endgroup\$
1
  • \$\begingroup\$ Judging from amazon.com "look inside" of a random page, this looks like it's The Book. \$\endgroup\$ Commented Sep 22, 2011 at 11:31
2
+50
\$\begingroup\$

I can't recommend any specific books on speech, but you might want to look at Festvox, CMU's open source speech synthesis library, as a starting point.

Awesome idea though, if you can produce voices near the same quality as Nuance or A Capella and be indie-friendly, that would be a huge opportunity for you and a great benefit to indie devs.

\$\endgroup\$
5
  • \$\begingroup\$ I know there are some open source speech synths, but they are mostly research projects and not too.. friendly. Also, I'd rather not touch GPL'd stuff too much, in case I get infected =) \$\endgroup\$ Commented Mar 22, 2011 at 12:48
  • \$\begingroup\$ Understandable, open source gets a little hairy. If it'd be beneficial to you at all, you should be safe to check out their source though, their license is MIT-style. pastebin.com/a45MUU5B (from copy.texi in the latest festvox tar). \$\endgroup\$ Commented Mar 22, 2011 at 16:55
  • \$\begingroup\$ Unfortunately they seem to concentrate on concatenative synthesis :( \$\endgroup\$ Commented Mar 25, 2011 at 8:44
  • \$\begingroup\$ Oh well, your answer is the best I got for this bounty, even if I still don't have the book. =) \$\endgroup\$ Commented Mar 31, 2011 at 20:14
  • \$\begingroup\$ Thanks! Good luck, and if you do find a good book, let me know, would love to delve more into that myself. \$\endgroup\$ Commented Apr 1, 2011 at 20:36
1
\$\begingroup\$

So far this has been the best resource I've found:

http://liceu.uab.es/~joaquim/speech_technology/tecnol_parla/synthesis/refs_sintesi.html#Speech_synthesis_formants

Most specifically: http://www.ling.ohio-state.edu/courses/materials/825/klsyn-dos/klsynman.pdf

Still, haven't found The Book.

\$\endgroup\$
0
\$\begingroup\$

I don't know of a book, but Vocaloid music is becoming very popular right now. You add in lyrics and melody and it can synthesize a singer. This is how it's possible:

It uses synthesizing technology with specially recorded vocals of voice actors or singers.

Pure synthesis of a voice may not be a reality yet, but reassembling recorded sounds with manipulation is a possibility.

\$\endgroup\$
2
  • \$\begingroup\$ Which, again, is concatenative synthesis. \$\endgroup\$ Commented Mar 26, 2011 at 19:11
  • \$\begingroup\$ Some of the systems such as Sinsy are HMM based \$\endgroup\$ Commented Apr 2, 2011 at 21:27

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.