Article

 

 

"Enhancing the quality of life for all individuals with learning disabilities and their families though advocacy, education, training, service and support of research."

Content Speech Recognition Technology for the Learning Disabled Student
by Mark Willette-Green, President, Vector Computer Technolgies, Inc., Phone: 800.660.0120 or mgreen@vector-computers.com

"For Americans without disabilities, technology makes things easier. For Americans with disabilities, technology makes things possible." Mary Pat Radabaugh, Study on the Financing of Assistive Technology Devices of Services for Individuals with Disabilities

The purpose of this article is to discuss current speech recognition technologies and their use for learning-disabled students.

I’d like begin by stating that I am not a learning disabilities professional or consultant. I am a technology nerd who has been working in the computer industry for over 25 years. Because of my general expertise with computer technologies, I have investigated, tested and mastered various hardware and software products. Ten years ago, I was asked by a client to investigate speech recognition software. Since then, I have, nearly exclusively, focused on speech technologies and their use, first as a tool for those with physical impairments, later for busy professionals, and more recently, as a tool for students with learning disabilities.

Speech Recognition

Early versions of speech recognition software (c. 1990) were expensive and required specialized hardware. They were useful, though clumsy, for those with physical limitations. But, since they were based on a halting, word-at-a-time, discrete dictation, they were rarely successful for use with learning disabled students, whose minds seemed to operate too fast to say a word, wait for a response, then dictate the next word.

By the mid to late-1990’s, speech recognition had progressed from discrete speech to continuous speech. The continuous speech programs began to be of practical use for specialized vocabularies, such as medicine, public safety, and law. But for general dictation, where a much more broad vocabulary was used, such as the writing of novelists and journalists, the reports of psychologists, and the homework of students, the accuracy still suffered.

By 2002, technology had progressed. I suspect that many readers are familiar with Moore’s Law, which, simply stated, postulates that computers will double in computational power every 18 months. Moore's Law, which is tied to a technical formulation of the number of transistors on a wafer of silicon, has basically held true since the 1950’s. The most powerful PC today will be eclipsed by a doubly powerful computer by at least 2005.

Speech recognition software is very computationally demanding. Faster computers mean faster response to speech. Basic rules that interpret sounds and convert them into reasonably accurate text are based on statistical databases, predicting which words make sense in a context, and based on words that have been used in recent past. As computers get faster, accuracy increases, since more statistical information can be analyzed as the words stream in through the microphone.

Another advance that has made a great deal of difference is in the area of microphone technology. Speech recognition relies on a clear sound being picked up from the microphone, then having the sound digitized before it is processed by the computer.

In the mid 1990’s, PC manufacturers developed the Universal Serial Bus, or USB digital signal interface. The USB interface immediately spawned the creation of USB cameras, printers, scanners, and microphones. Before the advent of USB microphones – and even today for those users who are not using a USB-based microphone, the computer needed to have a high-quality sound adapter. Manufacturers’ practice of installing generic sound chips on computer main boards nearly always resulted in poor speech recognition. The generic sound chip’s digitizing allowed for stray voltages from fans, hard disks, etc., to affect the signal. USB microphones, on the other hand, digitize the sound outside of the computer, creating a clean, accurate reproduction of the user’s speech.

The above advancements, along with improvements in the underlying algorithms of speech recognition has resulted in today’s technology, which is that, in most cases, a user can expect to be able to dictate at faster than normal conversation speeds (up to 160 words per minute) with at least 97% accuracy, within an hour or so of use.

The products that can currently achieve these levels of recognition are Dragon NaturallySpeaking® from ScanSoft, Inc.; and IBM’s ViaVoice, also sold and marketed by ScanSoft. Although NaturallySpeaking has a much larger user base, most experts feel that both products have similar accuracy. Dragon has the advantage of a cleaner, more user-friendly interface; whereas ViaVoice offers both PC and Macintosh versions. Microsoft is also dabbling with speech recognition, including it in the MS-Office 2002 and 2003 packages. Most speech recognition professionals believe that Microsoft's speech recognition software has a long way to go to attain the recognition and friendly user interface of NaturallySpeaking and ViaVoice.

Even with the current high accuracy that speech recognition can achieve these days, working with learning disabled students is still not a cakewalk. If the student has difficulty reading, screen reading technology may be of use. Additionally, there are various settings and tools within speech recognition software that can assist the user, such as automatic punctuation and vocabulary enhancement with existing documents.

Speech recognition technology has become an essential tool for individuals with learning disabilities. But it is far from perfect. It takes time, practice, and often more than a little ingenuity to become a productive tool for many people. But just as most people learned touch-typing over the course of a school semester and countless essays and reports, learning to type by voice will likely not be mastered in a single session. But once mastered, the learning disabled never again need to fear being left behind the touch-typists.

Our company sells, installs, trains, and customizes Dragon NaturallySpeaking and, to a lesser degree, IBM's ViaVoice. We generally recommend Dragon NaturallySpeaking Preferred for our clients with learning disabilities. Dragon's Preferred Edition offers speech playback and screen reading, which is sometimes helpful when working with learning disabled students. Conversely, Dragon's and IBM's Professional Solutions line of products is more appropriate for those working with specialized vocabularies, dictating into structured templates, or those dictating into third-party applications. Special volume licensing programs are also available, with significant discounts for educational institutions.

 

© 2002-2008 Learning Disabilities Association of Michigan