Speech Background 
(Thanks to Microsoft Speech Awareness)

 

Dictation

Hardware and Software Requirements

A dictation application requires certain hardware and software on the user's computer. Not all computers have the memory, speed, microphone, or speakers required to support speech, so it is a good idea to design the application so that speech is optional.

These hardware and software requirements should be considered when designing a speech application:

Limitations

Even the most sophisticated speech recognition engine has limitations that affect what it can recognize and how accurate the recognition will be. The following list illustrates many of the limitations found today. The limitations do pose some problems, but they do not prevent the design and development of savvy applications that use dictation.

Microphones and sound cards

The microphone is the largest problem that speech recognition encounters. Microphones inherently have the following problems:

  1. Not every user has a sound card. Over time more and more PCs will bundle a sound card.
  2. Not every user has a microphone. Over time more and more PCs will bundle a microphone.
  3. Sound cards (being in the back) don't make it very easy for users to plug in the microphone.
  4. Most microphones that come with computers are cheap, and they don't do as well as more expensive microphones that retail for $50 to $100. Furthermore, many of the cheap microphones that are designed to be worn are uncomfortable. A user will not use a microphone if it is uncomfortable.
  5. Users don't know how to use a microphone. If the microphone is a worn on their head they often wear it incorrectly, or if it sits on their desktop they will lean towards it to speak even though the microphone is designed for the user to speak from their normal sitting position;

Most applications can do little about the microphone. One way that vendors can deal with this is to test and verify the user's microphone setup as part of the installation of any speech component software. Software to test a user's microphone can be delivered along with other components to ensure that the user can periodically test and adjust the microphone and configuration.

Most users of dictation will wear close-talk microphones for maximum accuracy. Close-talk mikes have the best characteristics for speech recognition; they alleviate a number of the problems encountered in Command and Control recognition caused by weaknesses in the capabilities of user microphones in speech recognition and dictation applications.

Speech Recognizers make mistakes

Speech recognizers make mistakes, and will always make mistakes. The only thing that is changing is that every two years recognizers make half as many mistakes as they did before. But, no matter how great a recognizer is it will always make mistakes.

To make matters worse, dictation engines make misrecognitions that are correctly spelled and often grammatically correct, but mean nothing. Unfortunately, the misrecognitions sometimes mean something completely different than the user intended. These sorts of errors serve to illustrate some of the complexity of speech communication, particularly in that people are not accustomed to attributing strange wording to speech errors.

To minimize some of the misrecognitions, an application can:

Is it a Command?

When speech recognition is listening for dictation, user's will often want to interject commands such as "cross-out" to delete the previous word or "capitalize-that". Applications should make sure that:

Finite Number of Words

Speech recognizers listen for 20,000 to 100,000 words. Because of this, one out of every fifty words a user speaks isn't recognized because it isn't in the 20,000 -- 100,000 words supported by the engine.

Applications can reduce the error rate of an engine if the application tells the engine about what words the engine should expect.

Other Problems

Some other problems crop up:

Application Design Considerations

Here are some design considerations for applications using command and control speech recognition.

Design Speech Recognition in From the Start

Don't make the mistake of implementing speech recognition in your application as an afterthought. It's a poor design if the application is designed for a mouse and keyboard. Applications designed for just the keyboard and mouse get little benefit from speech recognition. The speech interface is at a point similar to where the mouse interface was when applications were designed for keyboard input only-not until applications were deliberately designed for mousing did the mouse prove generally effective for user input.

Do Not Replace the Keyboard and Mouse

Most dictation systems provide discrete dictation, allowing users to speak up to 50 words per minute. While this is faster than hunt-and-peck typists, touch typists can type at least 70 words per minute. Discrete dictation will not be used by touch typists. Continuous dictation allows up to 120 words per minute.

Communicate Speech Awareness

Since most applications today do not include speech recognition, users will find speech recognition a new technology. They probably won't assume that your application has it, and won't know how to use it.

When you design a speech recognition application, it is important to communicate to the user that your application is speech-aware and to provide him or her with the commands it understands. It is also important to provide command sets that are consistent and complete.

Manage User Expectations

Users will often have the expectation that speech-enabled applications will provide a level of comprehension and interaction comparable to the futuristic speech-enabled computers of Star Trek and 2001: A Space Odyssey. Some users will expect the computer to correctly transcribe every word that they speak, understand it, and then act upon it in an intelligent manner.

You should convey as clearly as possible exactly what an application can and cannot do and emphasize that the user should speak clearly, using words the application understands.

Where the Engine Comes From

If an application implements speech recognition, it can work on an end user's PC only if the system has a speech recognition engine installed on it. The application has two choices:

© 1995-1998 Microsoft Corporation. All rights reserved.