Better Speech Recognition Accuracy and Why Does It Matter?

April 23, 2021 | 4 minutes read

Have you ever tried using Apple’s Siri or Amazon’s Alexa and struggled to have the AI pick up on exactly what you were saying? These issues in picking up your spoken words relate directly to speech recognition accuracy. As the requirements for automatic speech recognition are constantly changing and increasing, the need for accurate speech recognition is more important than ever before. One major problem with many speech recognition software options is they struggle to pick up words, phrases, and utterances outside of the scope of standard speech. For instance, someone who speaks English with a heavy accent will struggle to have their words understood by a virtual assistant like Siri.

Moreover, there are significant racial and gender biases associated with many speech recognition programs as well. For instance, automakers have for years admitted that the speech recognition features that they implement into their vehicles work more effectively for men when compared to women. To give another example, research by Dr. Tatman published by the North American Chapter of the Association for Computational Linguistics or NAACL indicated that Google’s speech recognition rated as 13% more accurate for men than for women. To make matters worse, Google has regularly been rated the highest performer in providing speech recognition technology when compared to other offerings from companies such as IBM Watson, AT&T, Bing, and WIT.

Why does speech recognition accuracy matter?

Speech recognition by artificial intelligence is improving yet has not fully overcome all bias yet

While many people associate speech recognition technology with asking an AI assistant to complete a simple task, the uses for this software have come to be far more reaching and impactful than ever before. Voice recognition can now be used to influence important aspects of people’s lives including job hiring, transportation, and immigration decisions, among many others. Case in point, an Irish woman failed a spoken English proficiency test while trying to immigrate to Australia in 2017, although she was a highly educated native speaker of English.

These biases exist in large part because of disparities in how databases, data analysis, and machine learning are structured. The underlying reason for this is the average database will have an overwhelming amount of white male data and considerably less data from ethnic groups and women.

As such, AI is set to inherently fail some members of the populace, who may struggle to speak English fluently because it is not their native language or speak English with some accent or dialect that is outside of the realm of standard textbook English. To put it in the most simple terms possible, speech recognition software works by assigning data and patterns in the database of a software program to the speech inputs it receives from a speaker.

If this database has a high concentration of white male voices and inputs, the speech recognition software will invariably assign this data to people who are not white males. The only solution for this problem is to diversify the level of data and speech patterns included within a speech recognition software database. These databases need to have speech patterns and inputs from as many different people from varying demographics as possible so the software can provide accuracy for a wide range of individuals. While bias plays a huge role in speech recognition accuracy, there are other mitigating factors as well.

Another issue that works against speech recognition accuracy is the innate complexities of the English language itself. Regardless of any bias or data limitations, many automatic speech recognition programs will struggle to pick up the difference between very common English words such as “hear” or “here.” What’s more, the relationship between English spelling and word pronunciation can oftentimes become very convoluted. As English is a language that has borrowed aspects of so many different languages across the centuries, certain words and spellings will cause issues to any speech recognition system available today.

Are there any ways to avoid speech recognition inaccuracies?

While there are obvious limitations to voice recognition software, there are some settings and hardware decisions that can be made to receive the most accurate responses possible. One of the most important factors in achieving accurate and concise speech recognition output is headset microphone quality. A high-end headset will pick up inputs in a more effective manner than a lower-end headset.

What’s more, many computer settings now allow users to partake in voice training for speech recognition programs. This training will allow your computer to get a better understanding of the personalized ways in which you speak as opposed to only matching your inputs against a vast database. Many smartphones also come with features that make using speech recognition much easier.

For instance, Google offers its own speech recognition software application called SoundWriter, an add-on that will allow users to make use of speech recognition when writing in Google Docs. Alternatively, Apple offers automatic speech recognition software that is integrated with Siri across a multitude of languages. In addition, this software tackles the bias involved in speech recognition programs by adopting new words and pronunciations as they are spoken over time. Finally, Amazon Echo and Google Home have options that will let users train their voice by reading a series of sentences to the virtual assistant. Finally, the most important tip is to proofread any voice-to-text messages before saving or sending them, as technology is still no match for the human eye.