A couple of days ago I hooked up a USB headset. For fun, I also activated Windows Vista’s voice recognition system.
It works, after a fashion. Some of what I’m typing here I’m typing more or less directly using voice recognition. But I have to do a tremendous amount of correction. Let me give you an example. The following text I will type directly using voice recognition, without any corrections.
I’ve often had doubts about the usefulness of voice recognition. Basically I have to wonder how low will be applicable in an environment like an office. Really, even here in my own home, I’ve found many instances where its an inconvenience to be talking to the computer. But I would assume something select from botched to something damn the correction isn’t working correctly. Actually, since I said I wasn’t going to correct anything in this paragraph. , I suppose is a good thing that is not correct incorrectly. Enough of this crapola ready.
The above paragraph is after spending an hour or two training the system, using a high quality mike in a quiet room. I suppose that, one day, we will have nearly perfect voice recognition technology. We are far from that state today: getting a paragraph or two typed takes me a lot longer than simply hammering away at the keyboard, given how many mistakes I have to correct. That said, I’m still reasonably impressed with how capable it is already.
But where would I use voice recognition? Not in an office: it would be incredibly distracting there. At home, possibly: although it could be irritating to other people in the household who want to do other things like watch TV. In a car? Certainly: voice recognition makes tremendous sense there: being able to say “Radio 1130 AM” instead of reaching over and fiddling with buttons is both a convenience and a safety feature.
You might ask yourself, ” what, Kelly what about this is fun?” The answer is, you have to have a bit of a sense of humour with regards to some of the errors that the system makes. Every once in awhile, for example, it comes up with a true gem. When I was for starting the training, for example, I ended up laughing so hard that the voice recognition system mostly typed ha ha ha ha, ha, ha ha ha.
OK, so was funny at the time…
Voice recognition and speech synthises can be fun, if for no other reason than because it reminds us how truly stupid computers are … I think the humour comes in because we all anthropomorphise to some extent, and nothing reminds us of how unhuman computers are than when they try and act human.
Of course, some coincidences are just too funny. I remember us asking your car’s voice recognition GPS “where are the hookers?” and it coming back with the address of the nearest Howard Johnson’s hotel. THen we asked it to “find George Bush” and it seems he too was at the nearby Howard Johnson, with the hookers 😀
I think the big problem with voice recognition is that for it to be useful, it has to work like a human assistant. It has to listen in all the time, then be able to work on context, so the computer can provide the right bit of info at the right time, or be able to summarize the last 15 minutes of conversation. In other words, speech is great for communicating thought, it isn’t so great for giving extended series of commands. Until computers are smart enough to understand what we are talking about as we are talking about it, speech recognition will be of limited use. Computers hear fine, they “have good ears”. But they can’t understand what they hear, they have poor brains… and until then speech recognition will be like talking to an idiot. And we all know how unproductive that is 😉
I think you have the a similar perspective to mine on this, Chris. I can see the value of voice recognition today in some rather specific circumstances: situations where specific/structured commands can be given and where other forms of interaction (E.G.: keyboard) would distract the human. But the technology really isn’t very much use for “general purpose” functions like operating a home computer or transcribing text documents/emails.
Despite this, I do believe there is value in stretching the boundaries a bit. I spent an hour or so again playing with Vista’s voice recognition, and was able to dictate several fairly lengthy blocks of text without error. A big part of the challenge the Vista voice recognition thing has is that it is both trying to transcribe what I say, as well as pick up command patterns that tell it what to do. So I can tell it part way through a paragraph “select from ‘fairly’ to ‘error’- delete”, and it will try to distinguish that command sequence from my normal speech. It often doesn’t succeed, but…I doubt even just the basic recognition part would have been achievable five years ago with off-the-shelf gear. Without someone thinking it is worth the effort, I guess that progress wouldn’t have occurred. And I could see a time in, say, five or ten years, when the number of errors would approach zero.
If the “natural language recognition” part was 100%, the “talking to an idiot” part would be less of an issue. Even then, though…do I really want a computer to capture my spoken thoughts? When I’m typing, I edit and re-think things several times: when I speak, I stumble around, backtrack, and generally weave a crooked path. I suppose having a system that was smart enough to figure out what I really *meant* would be ideal…but I imagine the computer would shortly thereafter begin to wonder what value the human was adding to the process 😉