The Voice of Osama
bin Laden
Osama's voice on tape proves that the
leader of al Qaeda is still alive. Or does it?
by
Richard A. Muller
Technology for Presidents
On January 4, Al
Jazeera broadcast yet another audio tape purported to be from Osama bin Laden,
in which he exhorted his followers to 'continue the jihad.' The voice referred
to the capture of Saddam Hussein, proving the tape was recent. An anonymous CIA
official confided to the New York Times: 'It is likely the voice of Osama bin
Laden.' In an interview with CBS, Homeland Security Secretary Tom Ridge agreed.
I pronounced bin
Laden dead in my May 2002 and September 2002
columns. Am I ready to retract this claim, and to pay off several bets I made
back then? No, not yet. I think Osama bin Laden is still dead. And I don't
think I'm just being stubborn. To understand my logic, consider the following
three issues: the state of the antiterrorism effort, the technology of voice
identification, and the most likely alternative hypothesis that could explain
the audio tape.
Despite the
distractions of Iraq War II, the U.S. antiterrorism effort has remained
remarkably strong. The U.S.A. Patriot Act treads on our civil rights–but
it also makes it difficult for terrorists to operate in our homeland. Secretive
organizations cannot easily regroup when a few of their key links are
disrupted, whether through wiretapping, surveillance, or arrest. What do you do
when your one and only contact is gone? Al Qaeda may not be defeated, and it
can still send suicide bombers against soft targets, but its organization must
certainly be in a state of disarray.
Moreover,
experts I've consulted with tell me that cooperation between the United States
and foreign antiterrorist organizations remains strong, even in 'old Europe.'
Political disagreements about the wisdom of invading Iraq have not interfered
with the shared recognition of the dangers of terrorism. That is not
surprising; even France and Germany know that Osama didn't like them much more
than he liked the United States.
All that adds up
to a tough time for al Qaeda, and it and its sympathizers desperately need
encouragement from their charismatic leader. That is, of course, why the tapes
were made and broadcast. But why were they audio and not video? The voice
sounds right, but video would have been more convincing. Video recorders are
cheap and small. Osama could put all doubt to rest by releasing a film of him
holding up a recent newspaper of Saddam's capture. Prior to Tora Bora, videos
of him were the norm. What happened?
I can find only
two plausible explanations. One is that Osama bin Laden is severely ill or
wounded, and does not want the world to know it. The other is that he is dead,
and the audio tape is faked. But how could the counterfeiter do a good enough
job to fool experts?
Voice
recognition is a rapidly developing technology, thanks to the availability of
cheap computing power. You've probably seen a 'voice print,' a plot of
frequency density vs. time; music editing software makes them on personal
computers. Old voice recognition analysis made matches between sets of such
plots. Modern voice identification systems, which seek to have low false-alarm
rates even in the presence of noise, tend to depend more heavily on a technique
known as 'feature analysis.' A feature is a peculiar twist in the voice, often
a tell-tale transition between phonemes with different pitches. These are not
readily heard by listeners, but they can be picked out in a digital analysis.
Patterns of such glitches are unique identifiers, much like the ridge
bifurcations and other minutiae of fingerprint patterns are the keys in
fingerprint identification.
Voice
identification systems are already in widespread use around the world. They
are employed at the Canadian border to identify and track frequent
travelers, and in Britain to verify the compliance of young parolees. U.S.
companies, including Chase Manhattan Bank, Charles Schwab, and Prudential
Securities, use voice identification to control access to secure areas and
records. Visa is hoping to replace credit card verification personal
identification numbers with voice recognition; a computer will compare features
of your voice with those stored in the credit card chip.
With such a
success record, shouldn't voice recognition software work reliably to identify
Osama, or to reject an imitator? Unfortunately, the Al Jazeera tapes are not
high quality -- probably no better than telephone sound. That's good enough to
detect some kinds of deception, but not all. Here are three possibilities:
1. The tape
was made by an impressionist trying to imitate bin Laden's voice. Good impressionists
can mimic the tone and pacing of their subject, but they often overemphasize
obvious quirks, much as a caricaturist exaggerates dominant physical
features. That makes it amusing to hear, but it won't fool an analyst.
Impressionists are not good at catching the more subtle features that even
simple voice recognition software uses. This kind of counterfeit can almost
certainly be ruled out.
2. The tape
was made by cutting and pasting true excerpts from bin Laden's past speeches.
Much of the tape could be unchanged from a prior recording. The tough part for
the counterfeiter was adding mention of Saddam's capture, where words and
phrases had to be rearranged. To detect such a forgery, a good analyst would
listen for discontinuities in the background noise, or small blips indicating
the tape was spliced. Digital processing by the tape maker can remove such
artifacts, but they leave behind their own; low-pass filters, for example,
create easily detected changes in the spectrum of the background hiss. (That's
why true audiophiles dislike noise suppression filters. It is readily noticed
by a trained ear.) Such cutting and pasting, even with digital filtering, would
have been detected by the CIA. Digital processing can be detected in other
ways; for example, it sometimes generates false frequencies (called aliases).
Such tampering would have raised suspicions. Therefore this scenario can
probably be ruled out as well.
3. The tape
was a recording of one of Osama bin Laden's sons, who was deliberately trying
to sound like his father. This is, in my mind, the most likely hypothesis.
Saad bin Osama
bin Laden is the third of Osama's 23 to 50 children; he is known to be in his
early twenties. He has been active in al Qaeda since his pre-teen years, and
was probably being groomed for eventual leadership. He is reported to be fluent
in English and the use of computers. The Washington Post reported that Saad was a key organizer
of the May 12, 2003, al Qaeda bombing in Riyadh, Saudi Arabia. There have been
reports that he is hiding along the Afghanistan-Pakistan border; others say
that he is in Iran close to the Afghanistan border, in a region not controlled
by the Iranian government. The Arab newspaper Asharq Al Awsat says that Saad is now one of the
principal leaders of al Qaeda, but I'm skeptical of that. Al Qaeda is too
sophisticated to let such a young and inexperienced person take over. But he
likely has an extremely useful talent: sounding like his dad.
I like to
consider myself an expert in the voices of my wife and my two daughters. I
notice them even in a crowded and noisy room. When one of them telephones me, I
instantly recognize her -- but often incorrectly. The one I name is the one I
expect, not the one who called. (They find this very amusing.) I don't know if
the similarity of their voices is genetic or learned, but I know that others
have similar problems. Parents and children tend to sound alike, and that
effect is exaggerated when bandwidth is poor, such as in a telephone call or on
a cassette recording. In fact, commercial speech recognition software that is
"trained" to respond to a particular person's voice often will
have a hard time distinguishing the voice of a family member. The more
sophisticated systems that intelligence agencies presumably use may of course
be less prone to such confusion -- but I suspect that this vulnerability to
child and sibling spoofing remains. And I doubt that the U.S. government
has a recording of Saad to use for comparison.
Here is my
scenario:
Osama bin Laden
was killed at Tora Bora -- or his dialysis machine was destroyed and he died
shortly afterwards. The strongest evidence for this is the absence of new
videos. Al Qaeda fears that news of his death will shock and discourage many of
its supporters. There is no other leader who can hold together this diverse and
contentious organization, so they believe that they need to keep the news
secret. The initial tapes they released were old recordings of former speeches.
But many supporters were concerned. They, like me, noticed the absence of
videos, and of speeches with clear date indicators. Al Qaeda knew a video
counterfeit would be detected, but they noticed that Saad sounded a lot like
his father. They had him listen to his father's speeches, and practice enunciating
them with a similar style. It took many attempts, but Saad's voice on the final
tape was good enough to deceive not only al Qaeda's foreign legions, but even
some analysts at the CIA.
And if my
personal experience is indicative, the tapes may even have fooled one or more
of Osama bin Laden's wives.
Richard A. Muller, a 1982
MacArthur Fellow, is a physics professor at the University of California,
Berkeley, where he teaches a course called 'Physics for Future Presidents.'
Since 1972, he has been a Jason consultant on U.S. national security.