For Immediate Release
May 16, 1996
DIGITAL ANALYSIS OF SPEECH COULD LEAD TO NEW TEST FOR INTOXICATED DRIVERS
Slurred speech is often a sure sign that someone's been drinking.
Now, a Georgia Institute of Technology researcher is working with colleagues
from Indiana University to digitally quantify this telltale sign, which
could lead to a simple, non-invasive way to test a person's sobriety.
"This is basically an effect of fine motor control," said Kathleen E.
Cummings, a lecturer in Georgia Tech's School of Electrical and Computer Engineering. "We're
looking at specifically what happens during speech production at your
vocal cords, how steadily you can produce the excitation (air from your
lungs) going through your vocal cords."
Preliminary results show that intoxicated speech is marked by jumpy
changes in pitch and energy production and unsteady opening and closing
of the vocal cords.
Cummings discussed her work May 16 at the 131st annual meeting of the
Acoustical Society of America in Indianapolis.
She is working with Dr. David B. Pisoni
and Dr. Steven B. Chin of Indiana University, as part of their ongoing
study of ways to measure how alcohol consumption affects speech. The current
project is sponsored by the Alcoholic Beverage Medical Research Foundation.
Pisoni, director of Indiana's Speech Research Laboratory and a professor
of psychology, is considered a leader in the study of acoustical analysis,
synthesis and perception of speech. Chin is a psychology postdoctoral
student specializing in linguistics.
The two researchers approached Cummings after hearing about her thesis
work at Georgia Tech, published in 1992, on how speech changes when produced under
emotional stress or with linguistic effects such as talking quickly
or slowly, loudly or softly.
"Given her robust results in the differentiation of styles of stressed
speech, we thought that this type of analysis might show characteristic
changes in speech produced under alcohol," Chin said.
For her thesis work, Cummings used digitized speech collected from several
people speaking in 11 of the most common non-normal styles of speech.
She then spent several years analyzing the signals produced by the sounds,
looking specifically at the glottal excitation waveform.
During speech production, air passes from the lungs through the glottis,
an opening in the vocal cords, then is shaped into sounds by parts of
the vocal tract, such as the teeth, tongue and lips. If the glottis stays
open, the result is unvoiced sounds like "p" and "t." If it opens and
closes periodically, voiced sounds, like "b," "z" and vowels are produced.
The glottal excitation waveform is the puffs of air produced by the
opening and closing of the glottis during voiced speech. Cummings concentrated
on voiced sounds in order to study the glottal excitation waveform, which
is known to be important in the subtle parts of natural speech, such as
emotion and style.
She discovered distinct differences between normal speech and that produced
under emotional stress, with an accuracy rate of over 90 percent.
For her current research, Cummings said, "the idea is, can we do the
same thing with sober versus intoxicated speech? If we have a sample of
somebody's speech from an accident or at a particular time, can we analyze
it and say, 'Yes, this person is intoxicated,' if we compare this to his
normal, sober speech sample?"
To find out, Pisoni and Chin sent Cummings samples of sober and intoxicated
speech from four different people, gathered at Indiana University. They
include different types of speech, such as monosyllabic words, tongue
twisters, isolated sentences and passages of connected sentences.
Samples were taken when participants were sober, moderately intoxicated
(.05 percent blood alcohol level) and highly intoxicated (.10 percent
blood alcohol level or higher, considered legally drunk in most states).
Past perceptual research on this database has shown that a person listening
to the samples can reliably discriminate between sober and intoxicated
speech. Acoustic analysis also has shown that intoxicated speech is slower,
features longer sentences and is marked by mispronunciations, such as
slurred sounds and transposed letters and words.
In the current study, Cummings is finding that alcohol has a major effect
on the excitation parameters that reflect the steadiness with which a
person produces speech.
Four speaker samples may not sound like enough for a comprehensive study,
but Cummings said they form a sufficient database to make generalizations.
"If you see consistently the same trend between sober and intoxicated
speech for four different speakers, that's actually a lot," she said.
Also, Cummings plans to continue her research on the other five speakers
in the Indiana database.
Although much work is left to be done, Cummings said translating her
research into a practical public safety device could be relatively easy.
Law enforcement officials could record someone's speech at an accident
or traffic stop, then analyze it later against a sample taken at a different
time.
"If I can come up with a small set of parameters that differentiate
sober and intoxicated speech, which I think I can do, it's actually not
a hard task," she said. "There are some really simple distance measures
that involve very few calculations."
The analysis would be done by computer, based on a mathematical formula
that would yield a percentage probability as to whether the speaker was
intoxicated.
The only stumbling blocks could be recording quality and legal issues,
such as a person's refusal to give two samples for comparison.
"Now, if you ever got really, really lucky, and you found something
that you only ever saw in intoxicated speech ... then you would be able
to just do it on the fly," Cummings said. "But I haven't seen anything
like that yet."
More importantly, researchers have to compare their results against
other factors that alter the way a person speaks, such as speech impediments,
injuries, diseases or even common colds. Ataxic dysarthria, for example,
is a neurological condition that causes a person to sound intoxicated.
With more than a year's worth of work behind her and at least that much
to go, Cummings hopes to soon isolate a distinct set of parameters that
define intoxicated speech with a least 90 percent accuracy. Regardless,
she and her colleagues hope their research adds to the basic knowledge
and understanding of how speech is produced.
RESEARCH NEWS AND PUBLICATIONS OFFICE
Georgia Institute of Technology
75 Fifth Street, N.W. Suite 100
Atlanta, Georgia 30308 USA
MEDIA RELATIONS CONTACTS:
John Toon (404-894-6986);
Internet: john.toon@edi.gatech.edu;
FAX: (404-894-4545)
WRITER: Amanda Crowell
|