Select Page

If you read my tech blog, you already know that I’ve been fooling around with Vista’s new speech recognition feature for the last few weeks. This is the first Microsoft operating system to have speech rec built into the OS, although the last few versions of Microsoft Office installed speech recognition.

I approached it a bit warily. My past experiences with speech rec technology, both in Office and third party programs such as Dragon, all had mixed results. You could sum up my opinion of speech recognition, at least for my own use, as “more trouble than it’s worth.” I recognized how useful it could be for persons with certain physical disabilities: the blind, folks who can’t use their hands, etc. But I’ve been “thinking with my fingers” for a lot of years and can type 90 words per minute pretty consistently. Working with speech recognition only slowed me down.

Besides, I’m really not a “sound” person at all. I’d much prefer to type text messages or email (or snail mail, for that matter) than to talk on the phone. And I like either silence or soft instrumental music in the background when I work. I don’t even like the idea of working in an environment where there are a bunch of people talking to each other – much less talking to their computers.

But checking out Vista’s new features is part of my job, so I obligingly fired up the speech recognition applet in Control Panel and went through the preliminary steps of testing and adjusting my microphone. Then I started talking. On the first round, the voice commands worked fairly well (say “Click Start” to open the Start menu, “Open Internet Explorer” to start IE, etc.). Purely by telling my computer to do so, I managed to open a new document in Word. Then it was time to throw Vista a much bigger challenge: dictation.

Voice command works well because the system is listening for a relatively small number of pre-defined words. Dictation is a lot tougher, because the system must recognize a much larger number of words and differentiate between words with similar pronunciations and different spellings. My first try had me convinced that all those glowing reports about Vista’s speech capabilities must have been written by Microsoft PR people – or at least by folks with perfect Midwestern non-accents enunciating slowly and deliberately.

In the beginning, Vista definitely didn’t like my Texas accent. “The quick brown fox jumped over the lazy dogs” was translated into a garbled mess: “To quit brown fox junk over tea lazy dogs.” Of course, I was also using a cheap little desktop microphone. It had worked fine for recording voiceovers on PowerPoint presentations, but Microsoft warns in the Help file and the Speech setup wizard that you should use a good quality headset for best results.

To give Vista a fair chance, I went out and bought a Cyber Acoustics headset for forty dollars. I also decided to spend some extra time training the program to my voice. Maybe I could turn it into a Texan (after all, I turned my husband into one, and he was originally a Californian). So I went through about an hour of training, reading numerous text passages into my nice new mic.

And lo and behold, it made a tremendous difference. Now I was getting an error every two or three sentences, instead of three or four per sentence. Still not good enough for me to embrace it wholeheartedly, but a vast improvement. And the more I’ve played with it, the better it’s gotten. In fact, although I still prefer to type if I’m writing anything more than a sentence or two, the voice command function in particular is starting to grow on me. In conjunction with keyboard shortcuts, it saves me from having to take my hands off the board to click the mouse, and actually helps me to work faster instead of slowing me down.

I like the interface, too. That floating speech and language toolbar from Office 2003 is gone. There’s a very streamlined console that sits at the top of the screen when you have speech recognition turned on. It tells you the status (whether the system is listening, sleeping, or turned off). For security reasons, you should turn it off completely when you aren’t using it (see George Ou’s blog for more info on that). For a very short demo of how Vista’s speech recognition works, click here

And for a step-by-step guide on how to use Vista’s speech recognition, click here.

Working with this made me believe, for the first time, that maybe we really will be able to routinely control our computers with our voices, a la Star Trek, during my lifetime. But that’s going to pose some interesting problems. With Vista, I can configure the computer not to turn speech recognition on automatically when Windows starts, and I can easily turn it off completely. But if speech becomes the most common interface, will operating systems of the future give us that option? Or will your computer be listening to you constantly?

I’ve already seen what happens when you leave speech on inadvertently while taking a break from your work to exchange a few words with someone in the room. When you turn back to your document, you find your end of that conversation dutifully recorded in print (and, depending on the sensitivity of the microphone, maybe both ends of the conversation).

While ubiquitous high quality speech recognition offers great possibilities (imagine being able to type an entire report while driving to work), it also – like most technologies – conjures up images of some troubling applications. Speech recognition is likely to drive a trend toward more and better built-in microphones in computer systems. These mics could in turn be used to convert everything they “hear” into text files (without the user’s knowledge), even automatically scan those files for key words, and send files with suspicious word patterns to some central authority. Just another way for Big Brother to get his hooks into us a little deeper.

There could be more subtle sociological ramifications, too. If we’re always talking and listening to our computers, we’ll need a way to isolate our voices from ambient sound around us. Will that mean we’ll end up more closed off than ever from our fellow human beings? We already see “Pod People” everywhere, walking around seemingly oblivious to the outside world as they listen to their music or audio books on their MP3 players. Will speech-based interaction with our computers just exacerbate the situation more?

How do you feel about speech recognition in general and Vista’s implementation in particular? Should speech be part of the OS at all, or something that you buy only if you want it? Are you excited about the idea of being able to do away with other input devices, or will they take away your keyboard only when they pry it from your cold, dead hands? Have you used speech recognition programs? Did you love them or hate them? Do you feel silly talking to your computer? When the technology is finally perfected, will speech recognition become popular with everyone, or remain a “niche” application that only appeals to a small number of computer users? 

Deb Shinder, Microsoft MVP