Voice Recognition Is Awesome, But How Did It Get So Good?

October 03, 2021

Voice recognition technology has a rich history of development that's led it to what it is today. It's at the core of modern life, giving us the ability to do tasks just by talking to a device. So, how has this astonishing technology evolved over the years? Let's take a look.

1952: The Audrey System

The first step in voice recognition came about in the early 1950s. Bell Laboratories developed the first machine that could understand the human voice in 1952, and it was named the Audrey System. The name Audrey was sort of a contraction of the phrase Automatic Digit Recognition. While this was a major innovation, it had some major limitations.

Most prominently, Audrey could only recognize the numerical digits 0-9, no words. Audrey would give feedback when the speaker said a number by lighting up 1 of 10 lightbulbs, each one corresponding to a digit.

While it could understand the numbers with 90% accuracy, Audrey was confined to a specific voice type. This is why the only person who would really use it was HK Davis, one of the developers. When a number was spoken, the speaker would need to wait at least 300 milliseconds before saying the next one.

Not only was it limited in functionality, but it was also limited in utility. There wasn't much use for a machine that could only understand numbers. One possible use was dialing telephone numbers, but it was much faster and easier to dial the numbers by hand. Though Audrey didn't have a graceful existence, it still stands as a great milestone in human achievement.

1962: IBM's Shoebox

A decade after Audrey, IBM tried its hands at developing a voice recognition system. At the 1962 World Fair, IBM showed off a voice recognition system named Showbox. Like Audrey, its main job was understanding the digits 0-9, but it could also understand six words: plus, minus, false, total, subtotal, and off.

Shoebox was a math machine that could do simple arithmetic problems. As for feedback, instead of lights, Shoebox was able to print out the results on paper. This made it useful as a calculator, though the speaker would still need to pause between each number/word.

1971: IBM's Automatic Call Identification

After Audrey and Shoebox, other labs around the world developed voice recognition technology. However, it didn't take off until the 1970s, when in 1971, IBM brought the first-of-its-kind invention to the market. It was called the Automatic Call Identification system. It was the first voice recognition system that was used over the telephone system.

Engineers would call and be connected to a computer in Raleigh, North Carolina. The caller would then utter one of the 5,000 words in its vocabulary and get a "spoken" response as an answer.

1976: Harpy

In the early 1970s, the U.S Department of Defense took an interest in voice recognition. DARPA (Defence Advanced Research Projects Agency) developed the Speech Understanding Research (SUR) program in 1971. This program provided funding to several companies and universities to aid research and development for voice recognition.

In 1976, because of SUR, Carnegie Mellon University developed the Harpy System. This was a major leap in voice recognition technology. The systems until that point were able to understand words and numbers, but Harpy was unique in that it could understand full sentences.

It had a vocabulary of just about 1,011 words, which, according to a publication by B. Lowerre and R. Reddy, equated to more than a trillion different possible sentences. The publication then states that Harpy could understand words with 93.77% accuracy.

The 1980s: The Hidden Markov Method

The 1980s were a pivotal time for voice recognition technology, as this is the decade where voice recognition technology, as this was the decade that we were introduced to the Hidden Markov Method (HMM). The main driving force behind HMM is probability.

Whenever a system registers a phoneme (the smallest element of speech), there's a certain probability of what the next one will be. HMM uses these probabilities to determine which phoneme will most likely come next and form the most likely words. Most voice recognition systems today still use HMM to understand speech.

The 1990s: Voice Recognition Reaches The Consumer Market

Since the conception of voice recognition technology, it has been on a journey to find a space in the consumer market. In the 1980s, IBM showcased a prototype computer that could do speech-to-text dictation. However, it wasn't until the early 1990s that people started to see applications like this in their homes.

In 1990, Dragon Systems introduced the first speech-to-text dictation software. It was called Dragon Dictate, and it was originally released for Windows. This $9,000 program was revolutionary for bringing voice recognition technology to the masses, but there was one flaw. The software used discrete dictation, meaning the user must pause between each word for the program to pick them up.

In 1996, IBM again contributed to the industry with Medspeak. This was a speech-to-text dictation program as well, but it didn't suffer from discrete dication as Dragon Dictate did. Instead, this program could dictate continuous speech, which made it a more compelling product.

2010: A Girl Named Siri

Throughout the 2000s, voice recognition technology exploded in popularity. It was implemented into more software and hardware than ever before, and one crucial step in the evolution of voice recognition was Siri, the digital assistant. In 2010, a company by the name of Siri introduced the virtual assistant as an iOS app.

At the time, Siri was an impressive piece of software that could dictate what the speaker was saying and give an educated and witty response. This program was so impressive that Apple acquired the company that same year and gave Siri a bit of an overhaul, pushing it towards the digital assistant we know today.

It was through Apple that Siri got its iconic voice (voice by Susan Benett) and a host of new features. It uses natural language processing to control most of the system's functions.

The 2010s: The Big 4 Digital Assistants

As it stands, four big digital assistants dominate voice recognition and additional software.

Siri is present across nearly all of Apple's products: iPhones, iPods, iPads, and the Mac family of computers.
Google Assistant is present across most of the 3 billion + Android devices on the market. In addition, users can use commands across many Google services, like Google Home.
Amazon Alexa doesn't have much of a dedicated platform where it lives, but it's still a prominent assistant. It's available to be downloaded and used on Android devices, Apple devices. and even select Lenovo laptops
Bixby is the newest entry to the digital assistant list. It's Samsung's homegrown digital assistant, and it's present among the company's phones and tablets.

A Spoken History

Voice recognition has come a long way since the Audrey days. It's been making great gains in multiple fields; for example, according to Clear Bridge Mobile, the medical field benefited from voice-operated chatbots during the pandemic in 2020. From only being able to understand numbers to understanding different variations of full sentences, voice recognition is proving to be one of the most useful technologies of our modern age.

← Older Post Newer Post →