Book I: Procedural Content

I am archiving older pieces I have written on other sites, making this the definitive home for all my work. This is one of several I am porting over from my GameDev.Net user journal. Enjoy!

You ever have one of those days when things just seem to come together to give you a "sign"? I'd heard about .kkrieger when it'd first been released (for those not in the know, it's a 96kb FPS - yes, it's amazing), but GamaSutra ran an interview with Fabian Giesen which lead me to peruse their site and take a good look at the principles behind werkzeug, their procedural content generation and compression tool (it's available for free download, so try it out).

Briefly, the key to .kkrieger's compactness is that rather than storing image resources in a form ready to go into the game, it stores a series of operations to be applied to a minimal image sample to generate the runtime texture. As I read this, a light went off in my head. I had at one time worked for a text-to-speech technology company, which was later acquired by a major TTS vendor out of Boston. While there, I had always wondered about the applicability of some of their technology to games.

The downside of their technology, which used a licensed library to simulate air passage through various cavities in the head to generate a mathematical model of voice, was that it didn't sound very natural/human. You could change voices on the fly, change languages on the fly, supply text in a variety of domain-specific formats, use abbreviations ("3.5in" would be correctly read out as "three-point-five inches" or "three-and-a-half inches") and so forth, and you had some measure of inflection to indicate a variety of moods, but the voices just didn't sound... right.

Their approach is called Articulatory Synthesis. The other major approaches are Formant synthesis and Concatenative synthesis. All text-to-speech technologies have shortcomings, so its a matter of tradeoffs. Concatenative synthesis, depending on the specific technique used, can be quite small or very natural-sounding, but suffers from artefacts where audio samples are joined together. Formant synthesis tends to sound robotic, but is free of artefacts and remains highly comprehensible even at high speed.

Crispy posted a thread on innovation in the Lounge, which birthed this post from me. I now have another, albeit more short-term, software development project to embark on after graduation: I aim to write a text-to-speech synthesis engine favoring compactness without too great a sacrifice in overall audio quality. I also aim to write a variety of tools to facilitate the integration of programmatic speech in any application, with a particular view to supporting independent game development by reducing the cost of voiceover actors (which is unfortunate for voiceover actors, but that's why there's EA).

I think that there's enormous room for innovation in games, particularly by studying the problems that independent developers face. Typically, these problems are solved by the big studios by "throwing money at them": licensing expensive solutions, employing sub-optimal techniques or building costly workarounds. Finding elegant, complete and cost-effective solutions will be a massive boon to both indie and commercial development.

(For the record, I will make my products available free for non-commercial use, with graduated licensing costs for commercial use. But I'm getting ahead of myself.)

What "hard" problems do you see or face in your game development today?