Friday, February 7, 2014

How to Build a Speech Recognition Application



Prepare Speech Recognition Software

1. Bundle your software with a speech recognition program, such as Dragon NaturallySpeaking or IBM ViaVoice. If you're a software developer, give the user an option to buy the software. As part of your application's installation process, have the user install the speech recognition program too.

2. Configure the speech recognition software. In order for your application to be able to take full advantage of speech recognition, the speech recognition program must be correctly configured. This means that microphone and language settings must be set appropriately to take optimal advantage of the speech recognition program's capabilities.

3. Train the speech recognition program. This may have to be done outside of your application, depending on its nature. If this is the case, most speech recognition programs include training programs and screens, or the speech recognition program can be trained on a word processor.

Integrate Text Entry

4. Build a text or rich-text control into your application. Many speech recognition programs will work with any other programs that have text-entry options. If all you require is text entry, your application probably won't need any modifications to work with a speech recognition program.

5. Include extra space in the text-entry control. Since speech recognition programs can recognize speech at a rate faster than many people can type, it may be necessary to increase the size of your text-entry controls. Allow enough space for text to be entered and reviewed in real time.

Interact via an API

6. Use an application-programming interface (API) to interact with the speech recognition software. Many speech recognition programs include an API for other applications to use. Using one will allow your application to have full access to all speech recognition features and give the user full control over the application through speech.

7. Integrate the API with your application. This can include making more than one 'mode' of speech control. Create command words, such as 'save file' or 'create new file.' When entering text, users should also be able to make corrections without having to touch the keyboard and activate rich-text features, such as bold face, italics, underlining and other font changes.

No comments:

Post a Comment