Hey, Siri: Accent bias in speech recognition technology

“Hey Siri, how many times do I have to repeat myself before I simply decide to do it myself?"


Nicole E. Félix • 2021 Science Communication Series Cohort



“Hey Siri, what movies are showing in Caribbean Cinemas Aguadilla?”

“I’m sorry, I can’t quite understand you. Would you like to try again?”


I tried again but using my Google Assistant app.


“Ok Google, what movies are showing in Aguadilla?”

“Here are the movies showing in the Caribbean Cinemas in Aguadilla, Puerto Rico.”

“Thank you, Google.”


It is not the first time Siri has not been able to understand anything I say with a slight Puerto Rican accent. I have my mother as “Mami” and not “Mommy” but no matter how I pronounce it, Siri fails to recognize the name. Both examples speak to a larger problem in the development of artificial intelligence (AI): it is biased.




How could technology not be biased? Humans are liable to error, and every piece of tech you own has been designed by a human. Yes, even scientists who are supposed to embody the virtues of objectivity, rationality, and integrity are still biased, and the very tech we develop is a reflection of our own biases [9].


Speech recognition technology, such as the one that powers Siri, has been proven to be racially biased against Black speakers. A study from Stanford University found that when using five different automated speech recognition systems to transcribe structured interviews, the automated system was twice as likely to incorrectly transcribe audio from Black speakers when compared to white speakers [6]. The study noted that this was probably due to performance gaps in the acoustics models rather than grammatical or lexical characteristics. Acoustic models refer to the waveform of each letter, word, or other linguistic unit. This meant that the systems are confused by the pitch, intonation, and phonetics of the African American Vernacular English. The systems struggled with recognizing accents that were not from a white male. Furthermore, The Washington Post teamed up with Summa Linguae Technologies and Pulse Labs, two tech testing companies, to test how good Google Assistant and Amazon’s Alexa were at recognizing commands from people with accents. Throughout their tests, they found that people who spoke Spanish as a first language were understood 6% less often than people who grew up around Washington or California, where these two tech companies are based [7].


The systems analyzed in these studies were developed by Amazon, Apple, Google, IBM, and Microsoft – five of the leaders in the speech and voice recognition market [1]. How can this bias against a huge group of speakers be perpetrated by these five corporations? Trevor Cox, a professor of acoustical engineering at the University of Salford, shared that voice AI struggles to understand dialects of non-native English speakers mostly because it is not given training data that is representative of its diverse users [6]. As pointed out by the author of the Stanford study, the training data for these systems is sampled from predominantly white speakers. Thus, vocabulary and speech patterns of other populations are excluded from the systems. Being a primarily Spanish speaker, it is increasingly frustrating having to repeat myself to an inanimate speaker in order to get a task done. Most of the time, I desist after Siri, Google, or Alexa misunderstand me the second time around.


There is a much bigger problem here than me simply being misunderstood by my phone. The purpose of speech recognition technology is to allow computers to interpret and generate text from spoken audio. One of its aims is to serve the visually- and hearing-impaired. Moreover, its use has become more prominent in enabling hands-free technology for instances where your eyes and hands are occupied, such as driving. Speech recognition technology is supposed to help people, but the exposed racial biases in the tech show that specific underrepresented groups in the community cannot benefit from the increasingly widespread use of speech recognition technology.

Additionally, many companies use AI-driven services that are similarly built to recruit and screen potential job applicants. For some time, Amazon was building an AI recruiting tool in an attempt to mechanize talent search. Throughout their process, they trained the algorithm using resumés from the past 10 years they thought were good, and these resumés came mostly from men. The algorithm ended up penalizing any resumé that contained the word women on it. This phenomenon is a consequence of men’s current dominance across the tech industry. Amazon recruiters looked at the recommendations generated by the algorithm but never relied solely on those rankings [3]. Ultimately, the company discarded the project in 2018 because they found that the technology would also recommend unqualified candidates for the job. Once again, we see that a lack of diversity directly hinders the ability of these man-made systems to help underrepresented communities. If tech companies, especially those that create helpful services for marginalized communities such as speech recognition, really want to improve their services, they need to provide AI with examples stemming from marginalized communities. Only then will AI be of any help to companies in diversifying their workforce, a goal many state to have [3].


Numerous studies spanning decades of research have shown that increased diversity in the workplace leads to increased productivity, innovation, and financial performance. A study from 2008 on companies’ sales performance, showed that the presence of a more diverse-friendly climate led to increased sales of some employees [8]. They found that overall, white personnel were relatively impervious to the diversity climate, but black and hispanic personnel showed a greater increase in sales per hour under a more pro-diverse climate. They hypothesized that this was due to increased self-esteem in minority personnel as a result of having an environment that would affirm their identity. Moreover, in the healthcare field, increased diversity leads to better patient-doctor communication, improved risk assessment, and overall better patient outcome, partly due to an increase in patient compliance since patients could feel they related more to their healthcare team [4]. Additionally, a more diverse team enhanced medical deliberations which improved each patient’s risk assessment. In terms of group thinking, an article published in 2011 in the Psychological Bulletin Journal looked at the positive outcomes observed in diverse groups. The authors concluded that groups which experience diversity in a way that challenges existing stereotypical expectations enhanced their cognitive flexibility and creativity [2]. Tech corporations should follow the research that supports the benefits of increased diversity in the workplace across a variety of different fields.


Unsurprisingly, the leadership and employee makeup at these companies is prominently white and male. It is this lack of diversity within their teams that has led to the paucity of inclusion in their product design. If AI development companies want to effectively reach all users, they need to start by recognizing their biases and ensuring that their product development teams reflect the diversity of their users.



References:

[1] Beatrice, A. (2020, September 14). Top 10 Speech Recognition Companies to Watch in 2020. https://www.analyticsinsight.net/top-10-speech-recognition-companies-watch-2020/

[2] Crisp, R. J., & Turner, R. N. (2011). Cognitive adaptation to the experience of social and cultural diversity. Psychological Bulletin, 137(2), 242–266. https://doi.org/10.1037/a0021840

[3] Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G

[4] Gomez, L. E., & Bernet, P. (2019). Diversity improves performance and outcomes. Journal of the National Medical Association, 111(4), 383–392. https://doi.org/10.1016/j.jnma.2019.01.006

[5] Harwell, D. (2018, July 19). The accent gap: How Amazon’s and Google’s smart speakers leave certain voices behind. Washington Post. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/

[6] Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., & Goel, S. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684–7689. https://doi.org/10.1073/pnas.1915768117

[7] Link, J. (2020). Why Racial Bias Still Haunts Speech-Recognition AI. Built In. https://builtin.com/artificial-intelligence/racial-bias-speech-recognition-systems

[8] McKAY, P. F., Avery, D. R., & Morris, M. A. (2008). Mean Racial-Ethnic Differences in Employee Sales Performance: The Moderating Role of Diversity Climate. Personnel Psychology, 61(2), 349–374. https://doi.org/10.1111/j.1744-6570.2008.00116.x

[9] Veldkamp, C. L. S. (2017). THE HUMAN FALLIBILITY OF SCIENTISTS. 207.

137 views