Development > Engine drivers >


Writing New Engine Drivers

This page is intended for developers and documents how to write new TTS engine drivers for SpeechHub.

As outlined in an earlier page in the development section, We recommend developing drivers using the stdio-native protocol which the examples in this page demonstrate. Generally, we assume the engine will have a C API. If you have a Java based engine you would like to integrate, take a look at the source code for FreeTTS and Mary TTS.

Engine Integration Philosophy

One of the main strengths of SpeechHub is simple and reliable engine integration. We believe that it is our job to make writing engine interfaces as simple as possible, even if it complicates SpeechHub. We feel that advanced TTS engine features should be made available to the user, but for engines that don't provide advanced features, SpeechHub can provide decent emulation.

Writing a basic interface between the C API of a TTS engine and SpeechHub should take very little code. The example basic interface to espeak is only 122 lines of C code. The full featured interface to espeak is still under 200 lines.

We believe that a TTS engine should be compiled once, and then run for as many years as users care to use them. While this is the norm for Windows, in Linux good commercial engines like voxin become unusable with the desktop screen reader after each OS upgrade, and the same executables will not run on other Linux distributions than the one for which it was compiled. Our solution to this problem is to have SpeechHub communicate with TTS engines using a simple stdio protocol. By having SpeechHub deal with audio, the TTS engine is not required to link to Linux distribution specific sound libraries. This makes the engine a stand alone executable which simply reads from stdin and writes to stdout. Such executables can typically be compiled on one Linux distro, and run on any of the others. It is our hope that SpeechHub will encourage more commercial TTS vendors to support Linux, as they should be able to offer a simple executable that works for all SpeechHub users on all Linux distributions.

SpeechHub comes with a copy of all TTS engine drivers we know of, rather than using packages available on Linux or Windows that provide the same engines. This greatly enhance robustness. This scheme means you can generally count on the TTS engines supported by SpeechHub to work on your system. If you create a new TTS engine driver, please consider allowing it to be included with SpeechHub by default.

Basic TTS engine integration

Adding support to SpeechHub for a new TTS engine is very easy. SpeechHub takes care of all the audio processing, so the TTS engine needs only to generate samples. If there are features missing from the speech engine like pitch or speed adjustment, SpeechHub can provide them.

A good way to start is to copy the example_engine.c file to sometts_engine.c, assuming the name of your TTS engine is sometts. The only other file required to compile your new engine is server.c. Only six functions must be written to integrate a new speech TTS engine. initializeEngine is called before any other, and provides a path to the data directory. CloseEngine is called last, and the TTS engine should do any cleanup needed.

_bool initializeEngine(char *synthdataPath)_\
_bool closeEngine(void)_

Once initialized, SpeechHub will call getSampleRate to find the sample rate of the engine, which is used to initialize the audio channel.

_int getSampleRate(void)_

If the TTS engine supports multiple voices, the functions getVoices and setVoice should be implemented:

_char **getVoices(int *numVoices)_\
_bool setVoice(char *voice)_

Finally, speech needs to be synthesized. The function speakText must be written, and it must call processAudio periodically to provide speech samples to SpeechHub.

_bool speakText(char *text)_\
_bool processAudio(short *data, int numSamples)_

If processAudio returns false, then the TTS synthesizer should abort speech synthesis, and speakText should return true immediately.

Since a basic TTS engine interface does not support speed, pitch, or punctuation levels, you should tell SpeechHub to provide these features in the engine.properties file. Your properties file should look like:

engine.package.version=0.1.0
engine.protocol=stdio.native
engine.protocol.version=1.0
engine.use.sonic=true
engine.use.sonic.pitch=true
engine.supports.punctuation=false
engine.supports.ssml=false

Once you have a basic engine interface working, you might want to enable some more advanced TTS synthesizer features. The file espeak_engine.c is meant to be a good example of a full featured engine. Even so, it's less than 200 lines of code. Features you can add are:

speech rate control
pitch control
punctuation level
SSML support
voice variants

Simply fill out the stub functions to support any or all of these features, and modify your engine.properties file to indicate the support.

Engine Directory Structure

SpeechHub will automatically detect and run your TTS engine provided it uses the same directory structure as the other TTS engines. Typically, you put your engine in the sh/engine directory which exists in the SpeechHub installation directory. You should look at the espeak-0.1.0 directory for an example of what you need. The sh subdirectory must exist and contain the engine.properties file. There should be an stdio-w for Windows support, and stdio-l and optionally stdio-l64 for 64-bit Linux support.


[ Next - Writing new engine installers ]

[ Previous - Engine properties file ]

[ Up - Engine drivers - Section ]

[ Up 2 - Development - Section ]

[ Up 3 - SpeechHub - TTS server for the vision impaired community - Main Index ]