Topic -

Development >

Creating Audio Books

The basic ability to create audio files from text is built into SpeechHub, but currently it is not packaged to be easily usable by the blind. The following is a fully functional text to audio converter command line based program which could provide a starting point for a fully functional audio book creation product which would need an intuitive user interface and support for audio formats such as MP3 in addition to the 'wav' format currently supported.

For now, this program is just for hackers who are comfortable with the command line. There are scripts available in both Windows and Linux for generating .wav files from .txt files.

Windows

To create an audio book in windows, first start a command shell by pressing the Windows + r keys, typing cmd, and enter. Then, change directory to where your text file is that you want to convert to audio. Let's assume it's called myfile.txt. Assuming you have a 32-bit version of Windows, you can create a wave file from a text file like myfile.txt using the default eSpeak voice with this command:

C:\Program Files\SpeechHub\SpeechServer\sh-say.bat -f myfile.txt -w myfile.wav

Options for using different voices and speeds than default are described at the end of this page.

Linux

Simply open a bash shell, change to the directory containing myfile.txt, and type:

sh-say -f myfile.txt -w myfile.wav

Options

Whether in Windows or Linux, the options are the same. To get a summary of options, just run the sh-say or sh-say.bat script with the -h option. Options are:

-s speed: Set the speed of speech. -100 is slowest, 100 is fastest.
-p pitch: Set pitch. -100 is lowest, 100 is highest.
-f file: Speak the text in file.
-e engine: Use a specific TTS engine, currently one of:
espeak, ibmtts, marytts, picotts, and freetts
-f file: Read input text from the file.
-v voice: Switch to the named voice
-l: List supported voices
-w file: Write speech output to a file rather than speak it

For example, in Linux, to generate an .wav file using the MaryTTS cmu-rms-hsmm voice at 2X speed, type:

sh-say -s 20 -e marytts -v cmu-rms-hsmm -f myfile.txt -w myfile.wav

For engines using sonic speed up, 0 means normal speed, 20 is 2X speed, 40 is 3X, 60 is 4X, 80 is 5X, and 100 is 6X speed.

[ Previous - Engine drivers - Section ]

[ Up - Development - Section ]

[ Up 2 - SpeechHub - TTS server for the vision impaired community - Main Index ]