In future, computers will shrink more and more. We will have small
but powerful devices.
In the kitchen, the refrigerator will tell you what you've to buy. In cars, you will ask the travel assistant, how you can get to a special destination and it will tell you how you'll have to drive. A small alarm clock will make the whole time planning for you. You can't use a keyboard to input your appointments into the clock. And the clock won't have a big display to show the next one.
Victor Zue was born in China and went to Florida in the late 1960s
to study there, near his older sisters. So he had to learn the English
language and the American pronunciation. It was difficult for him.
In 1968, he saw the science-fiction film "2001: A Space Odyssey", in which the computer "HAL" talked. So he had the idea to develop real speech systems for computers. He came to MIT and began to analyse speech.
First, the computer has to record the spoken sentence. Then it divides
the sound into the containing frequencies. Out of these, it can
get the phonemes, the basic phonetic parts of words.
Connecting these phonemes, it can build possible words. According to
grammar rules and saved meanings of words, combined with probability
statistics, it can understand what you've said.
The computer can answer with information out of databases in it or in the Internet. It builds sentences, transforms the words into its phonemes and sends it to the connected loud speaker. It is nearly the same process in the reverse direction.
But today's systems make one failure per sentence, on average. So "recognize speech" can be understood as "wreck a nice beach ". The difficult words are homonyms like there and their. Also same letters can be pronounced different in several words, like the "t" in try and button. Depending on the context, the same word can mean many diverse things.
The first computer systems were huge. Each was used by many people. Today we have one computer per person. But this is changing. Soon we won't have a laptop or a home computer, but small devices, which we can use for nearly everything. You will download the required software from the Internet. The chip in the computer will contain identical tiles; the software connects and tunes they to get the right functions. Only these chips are scalable enough to understand all speech commands and make these variable devices possible.
Today we can control cell phones by speech and an editing software transforms our spoken words into text. But you've to train this software to your voice. Victor Zue's MIT lab made one system, which you can ask by a telephone call. Mercury Travel Service for flights gives the right answer to the question: "When does the next flight leave from Boston for San Francisco ?" In future, several systems (to reduce the context failures) will be connected, so you can ask for nearly everything.
ContentsStefan Ziegler, 02.04.2001