Topic -

SpeechHub - TTS server for the vision impaired community

Introduction

SpeechHub is the first cross-platform TTS server dedicated to the requirements of the vision impaired community. A speech server sits between a client application such as a screen reader or a self-voicing program and a speech synthesizer. It accepts commands from the client application and, after optional processing, passes it to the speech synthesizer. This approach offers a uniform interface to applications that want to use speech, allowing developers to concentrate on what their application does rather than worry about speech processing and connection to different synthesizers. The concept of a speech server is not new and there are both commercial and open source examples. In the context of the VI community and open source, the most well known is Speech Dispatcher which is well established in Linux.

SpeechHub's cross-platform architecture offers flexibility, reliability and a host of new features. It is now available for Windows, is under development for Linux and other platforms will be considered in the future.

The connection to an application is done using a client plug-in available for:

Windows - NVDA screen reader
Windows - SpeakOn MediaSuite
Linux - Orca, Speakup, Yasr screen readers (still in development)

SpeechHub currently supports the following synthesizers:

Open source:
- eSpeak with many voice variants (courtesy of NVDA) and amazing language support
- Mary TTS, 4 English voices usable with NVDA and Orca
- Pico TTS voices in six languages
- FreeTTS
Microsoft Speech API version 5 on Windows
Microsoft Speech Platform on Windows 7 and Windows 8
Voxin (IBM TTS) on Linux
(requires license from http://voxin.oralux.net)

Communication with the server is done using a subset and extensions to the Standard SSIP protocol interface, over TCP/IP, Stdio, or Unix sockets. SpeechHub promotes uniformity in TTS by optionally processing text before it is sent to the synthesizers by formatting, replacing, punctuation and capital letter indication which are all customizable. Audio data samples are sent back from the synthesizer (if supported) and are processed centrally by the server. It is highly responsive and speeding up speech is built in using Sonic for most synthesizers. There is built-in support for creating audio books.

Communication with TTS synthesizers is through engine drivers which are independent from the server and can be implemented in any programming language. Very low integration effort is required for new TTS engines; for example, the eSpeak SpeechHub engine driver is less than 200 lines of code. Binary code TTS engines are compatible across all major Linux distros. TTS engines can be integrated once and run everywhere, reliably, for years. Text replacement can be customized for each engine avoiding wrong pronunciation and text that causes crashes. In the event that an engine crashes, it is restarted automatically.

SpeechHub is developed entirely by people with vision impairments for people with vision impairments.

[ Next - NVDA: download, installation and use - Section ]

[ Up - SpeechHub - TTS server for the vision impaired community - Main Index ]