Topic -

NVDA: download, installation and use >

Why you might want to use SpeechHub with NVDA

NVDA, a free and open source screen reader for the Microsoft Windows operating system is available from:
here
NVDA comes 'out of the box' with native drivers for eSpeak, Microsoft Speech API version 5 and Microsoft Speech Platform. There are also additional add-ons for other free and commercial synthesizers. SpeechHub supports a number of synthesizers and the following information will help you decide if it might be useful to you.

Experience shows that SpeechHub runs well on most modern computers.

Some synthesizers, particularly MaryTTS is more demanding especially at very high speech rate and if your computer is not powerful enough or you don't have enough memory, speech might be cut off or sound intermittent. SpeechHub might tell you that you don't have enough power in which case it will insert a delay before each message to allow the synthesizer to catch up with the speech. You can try to reset this pause by changing the voice rate up and then down but if you get these problems again, try to switch to a different voice or synthesizer. If the problems persist, SpeechHub is unfortunately not suitable for your computer.

All synthesizer drivers return the audio they generate from speech to SpeechHub which then processes the audio to apply rate boost if necessary and maximize efficiency. Short messages such as single characters are stored in memory for easy retrieval so when you type a single character for the first time it goes into memory; when you type the same character again it is not synthesized at all but just pulled from memory which is very fast.

The advantages and disadvantages for each of the following synthesizers are summarized below:

eSpeak

eSpeak is fast, small and efficient. If you are an eSpeak user, use the native driver. There is no advantage in using eSpeak with SpeechHub at all. So why was eSpeak incorporated into SpeechHub? This is because some applications do not have a native driver like NVDA. In addition, SpeechHub uses eSpeak as a 'backstop'; if any synthesizer fails, SpeechHub attempts to restart it three times and if not successful, it starts eSpeak. eSpeak runs well with SpeechHub but it might sound somewhat different to what you are used to.

MaryTTS

Credit is due to the MaryTTS project, as these voices are probably the highest quality open source can offer, especially in continuous reading. Typing quality with Echo on is however lacking. To our knowledge SpeechHub is the only system which enables practical use of MaryTTS. The official MaryTTS installation is slow and somewhat buggy. Moreover because MaryTTS is inefficient, without SpeechHub efficiency measures outlined above, MaryTTS would not be practical. MaryTTS can take up to 170 Mb of memory and can soak up CPU power on low specification computers.

Microsoft Speech API version 5, Microsoft Speech Platform

With the unusual ability of SpeechHub to get the audio and process it from SAPI5 and MSP, provided your computer has sufficient power and memory, you should get a better typing response than with standard drivers. This approach also enables you to use rate boost if you wish to get clarity at higher speeds. The MSP SpeechHub driver works well with all voices. The standard SpeechHub SAPI5 driver works well with many SAPI5 voices but some voices sound harsh. For these situations, an alternative SAPI5 'common driver' is provided; all voices sounds clear with this driver but performance is similar to the NVDA's native SAPI5 driver.

PicoTTS

PicoTTS is a fairly slow synthesizer so it benefits from the SpeechHub approach. Rate and pitch are handled by Sonic rather than using its native controls. This approach in our view gives better performance.

SpeechHub improvements in response have been measured for example with a high quality SAPI5 voice when typing. The time lag with a common type driver until the voice is sent to the speakers is 30 - 60 ms (thousands of a second). With the SpeechHub standard driver, after a character has been typed once, the time lag is 0 - 1 ms. A lag of perhaps 30 ms or more is noticeable. Some of the above parameters such as voice quality are subjective - make up your own mind!

[ Next - Download and installation for use with NVDA ]

[ Up - NVDA: download, installation and use - Section ]

[ Up 2 - SpeechHub - TTS server for the vision impaired community - Main Index ]