Academia
Unconscious Bias

Making Our Voices Heard: Meghan Sumner investigates the gender dynamics of voice processing

“Raise your voice!” We’re often encouraged to speak up and speak out, to make our voices heard. But what happens when the voice itself becomes a source of unconscious bias? Meghan Sumner, associate professor of linguistics at Stanford and former Clayman Institute faculty research fellow, provides compelling research that illustrates how the way we “hear” a person can shape our perceptions.

When we process speech, says Sumner, we hear more than sounds and words—we hear a person. As we listen to a voice, an image of the talker emerges from vocal cues about gender, age, class and geographic background. Sumner’s research explores how listeners use social information gleaned from the voice to process spoken language. Vocal cues can activate stereotypes and social biases in the earliest moments of language processing, altering how speech is interpreted. When it comes to gender, Sumner finds, “we’re making categorizations about gender roles, or gendered inferences, very early.”

Voice provides context for understanding language

Linguists have long known that context is key to understanding spoken language. As Sumner explains, speech is highly variable. “There are at least seven acoustic realizations of the letter ‘T,’” she says as an example.  Furthermore, “no word is uttered the same way twice.” With all this variation just from a single speaker, the fact that we understand one another so well with so few communicative breakdowns is amazing. To interpret the meaning of speech correctly, listeners therefore rely not only on sounds and words but also on this variation, which serves as both linguistic and social context for listeners.

“Voice is just another type of context,” Sumner explains. By providing social cues about a speaker, voice can help listeners interpret speech. As an example, Sumner offers the statement, “I need to buy some clothes.” Absent social context, the listener might think of generic clothing stores to recommend to the speaker. Sumner cites Siri, the digital personal assistant on an iPhone, as an application that uses this type of literal word processing to generate recommendations based on proximity, independent of talker voice. In conversations, by contrast, people contextualize the statement “I need to buy some clothes” using the speaker’s social characteristics. If the speaker is a man, the listener will likely recommend different stores than if she is a woman.

Voice may activate unconscious social biases

While voice can serve as a useful context that facilitates communication, it can simultaneously activate social biases that might limit the attention given to or retention of certain speech. In a lab experiment, Sumner asked participants to complete a standard word association task but with a tweak: instead of reading words from a list, participants heard them spoken by either a man or woman and then responded with the first word that came to mind. Sumner found that the word participants most strongly associated with each prompt varied depending on the gender of the speaker. For example, when a man said “academy,” participants responded with “school,” whereas when a woman said “academy,” the most common response was “award.”

Sumner and her team have found that a voice’s social cues influence language processing at the “pre-lexical level,” or before the word is consciously interpreted. To test the role of voice in real-time language processing, she devised an experiment in which subjects were told to identify a picture of a glove out of a set of four images. In one scenario, there were two images of gloves: a man’s and a woman’s. When the instructions were written on-screen, without social context, participants picked the man’s or woman’s glove with equal frequency. When a man spoke the instructions, however, subjects selected the man’s glove more often than the woman’s. Using eye-tracking technology, Sumner’s experiment showed that this gendered response happens even before the participant is aware of it. More recently, working in collaboration with psychology professor Michael Frank and recent graduate Nicholas Moores, Sumner has shown that children as young as four years old also use gender-specific voice information to infer talking meaning.

The influence of voice on gender dynamics 

Sumner’s work calls attention to the complexity of having one’s voice heard. While speaking up may increase one’s influence, particular vocal characteristics may simultaneously undermine the effectiveness of speech. Given the complicated relationship between voice and the reception of speech, how might unconscious bias shape the reception of women speakers in the workplace and elsewhere? Furthermore, how might biases resulting from vocal cues accumulate across interactions to limit the influence of women’s or other groups’ speech?"

For Sumner, the research on “talker information” is just beginning. She has gathered convincing evidence that social information embedded in speech, rather than phonetics alone, influences how attentive listeners are and how well they are able to recall information. For example, experiment participants tend to be more attentive to speakers of British English than to those with a New York accent, although the exact reason for this difference is far from being understood. However, "it is clear that how we hear a voice can change depending on whether it is alone, or in the context of another voice,” Sumner explains. “Everything is relative.”

What implication does that have for the gender dynamics of speech processing? Sumner intends to answer that question by building on past studies of regional accents to investigate how women’s voices are processed relative to men’s. If voice influences substantive interpretations of speech as well assessments of a talker’s reliability and intelligence, then increasing women’s influence may well depend on understanding the dynamics of voice.