|
Content Speech Recognition Technology for the
Learning Disabled Student
by Mark Willette-Green, President, Vector Computer Technolgies,
Inc., Phone: 800.660.0120 or
mgreen@vector-computers.com
"For Americans without disabilities, technology makes things easier.
For Americans with disabilities, technology makes things possible."
Mary Pat Radabaugh, Study on the Financing of Assistive Technology
Devices of Services for Individuals with Disabilities
The purpose of this article is to discuss current speech recognition
technologies and their use for learning-disabled students.
I’d like begin by stating that I am not a learning disabilities
professional or consultant. I am a technology nerd who has been
working in the computer industry for over 25 years. Because of my
general expertise with computer technologies, I have investigated,
tested and mastered various hardware and software products. Ten years
ago, I was asked by a client to investigate speech recognition
software. Since then, I have, nearly exclusively, focused on speech
technologies and their use, first as a tool for those with physical
impairments, later for busy professionals, and more recently, as a
tool for students with learning disabilities.
Speech Recognition
Early versions of speech recognition software (c. 1990) were
expensive and required specialized hardware. They were useful, though
clumsy, for those with physical limitations. But, since they were
based on a halting, word-at-a-time, discrete dictation, they were
rarely successful for use with learning disabled students, whose
minds seemed to operate too fast to say a word, wait for a response,
then dictate the next word.
By the mid to late-1990’s, speech recognition had progressed from
discrete speech to continuous speech. The continuous speech programs
began to be of practical use for specialized vocabularies, such as
medicine, public safety, and law. But for general dictation, where a
much more broad vocabulary was used, such as the writing of novelists
and journalists, the reports of psychologists, and the homework of
students, the accuracy still suffered.
By 2002, technology had progressed. I suspect that many readers are
familiar with Moore’s Law, which, simply stated, postulates that
computers will double in computational power every 18 months. Moore's
Law, which is tied to a technical formulation of the number of
transistors on a wafer of silicon, has basically held true since the
1950’s. The most powerful PC today will be eclipsed by a doubly
powerful computer by at least 2005.
Speech recognition software is very computationally demanding. Faster
computers mean faster response to speech. Basic rules that interpret
sounds and convert them into reasonably accurate text are based on
statistical databases, predicting which words make sense in a
context, and based on words that have been used in recent past. As
computers get faster, accuracy increases, since more statistical
information can be analyzed as the words stream in through the
microphone.
Another advance that has made a great deal of difference is in the
area of microphone technology. Speech recognition relies on a clear
sound being picked up from the microphone, then having the sound
digitized before it is processed by the computer.
In the mid 1990’s, PC manufacturers developed the Universal Serial
Bus, or USB digital signal interface. The USB interface immediately
spawned the creation of USB cameras, printers, scanners, and
microphones. Before the advent of USB microphones – and even today
for those users who are not using a USB-based microphone, the
computer needed to have a high-quality sound adapter. Manufacturers’
practice of installing generic sound chips on computer main boards
nearly always resulted in poor speech recognition. The generic sound
chip’s digitizing allowed for stray voltages from fans, hard disks,
etc., to affect the signal. USB microphones, on the other hand,
digitize the sound outside of the computer, creating a clean,
accurate reproduction of the user’s speech.
The above advancements, along with improvements in the underlying
algorithms of speech recognition has resulted in today’s technology,
which is that, in most cases, a user can expect to be able to dictate
at faster than normal conversation speeds (up to 160 words per
minute) with at least 97% accuracy, within an hour or so of use.
The products that can currently achieve these levels of recognition
are Dragon NaturallySpeaking® from ScanSoft, Inc.; and IBM’s ViaVoice,
also sold and marketed by ScanSoft. Although NaturallySpeaking has a
much larger user base, most experts feel that both products have
similar accuracy. Dragon has the advantage of a cleaner, more
user-friendly interface; whereas ViaVoice offers both PC and
Macintosh versions. Microsoft is also dabbling with speech
recognition, including it in the MS-Office 2002 and 2003 packages.
Most speech recognition professionals believe that Microsoft's speech
recognition software has a long way to go to attain the recognition
and friendly user interface of NaturallySpeaking and ViaVoice.
Even with the current high accuracy that speech recognition can
achieve these days, working with learning disabled students is still
not a cakewalk. If the student has difficulty reading, screen reading
technology may be of use. Additionally, there are various settings
and tools within speech recognition software that can assist the
user, such as automatic punctuation and vocabulary enhancement with
existing documents.
Speech recognition technology has become an essential tool for
individuals with learning disabilities. But it is far from perfect.
It takes time, practice, and often more than a little ingenuity to
become a productive tool for many people. But just as most people
learned touch-typing over the course of a school semester and
countless essays and reports, learning to type by voice will likely
not be mastered in a single session. But once mastered, the learning
disabled never again need to fear being left behind the
touch-typists.
Our company sells, installs, trains, and customizes Dragon
NaturallySpeaking and, to a lesser degree, IBM's ViaVoice. We
generally recommend Dragon NaturallySpeaking Preferred for our
clients with learning disabilities. Dragon's Preferred Edition offers
speech playback and screen reading, which is sometimes helpful when
working with learning disabled students. Conversely, Dragon's and
IBM's Professional Solutions line of products is more appropriate for
those working with specialized vocabularies, dictating into
structured templates, or those dictating into third-party
applications. Special volume licensing programs are also available,
with significant discounts for educational institutions. |