The voices of artificial intelligence tell us a lot

What does artificial intelligence sound like? Hollywood has imagined this for decades. Now AI developers are taking inspiration from movies, creating voices for real machines based on dated movie fantasies about how machines should talk.

Last month, OpenAI revealed updates to its AI chatbot. ChatGPT, the company said, was learning to hear, see, and converse in a natural voice, one that sounded a lot like the disembodied operating system voiced by Scarlett Johansson in Spike Jonze’s 2013 film “Her.”

ChatGPT’s voice, called Sky, also had a raspy timbre, a soothing effect, and a sexy edge. It was pleasant and modest; it seemed ready for anything. After Sky’s debut, Johansson expressed disappointment that it sounded “disturbingly similar” and said she had previously rejected OpenAI’s request to voice the bot. The company protested that Sky was voiced by a “different professional actress,” but agreed to pause her voice out of respect for Johansson. OpenAI users left without a voice started a petition to bring her back.

AI creators like to tout the increasingly naturalistic capabilities of their tools, but their synthetic voices are built on layers of artifice and projection. Sky represents the cutting edge of OpenAI’s ambitions, but it’s based on an old idea: that of the AI ​​bot as empathetic, compliant woman. Part mother, part secretary, part girlfriend, Samantha was a multipurpose comfort object that purred directly into her users’ ears. Even as AI technology advances, these stereotypes are being recoded again and again.

Women's voices, as Julie Wosk notes in “Artificial Women: Sex Dolls, Robot Assistants, and Other Female Facsimiles,” have often fueled fictional technologies before they were integrated into real ones.

In the original “Star Trek” series, which debuted in 1966, the computer on the bridge of the Enterprise was voiced by Majel Barrett-Roddenberry, the wife of series creator Gene Roddenberry. In the 1979 film “Alien,” the crew of the USCSS Nostromo addressed the computer voice as “Mother” (her full name was MU-TH-UR 6000). Once tech companies began marketing virtual assistants (Apple's Siri, Amazon's Alexa, Microsoft's Cortana), their voices were also largely feminized.

These first-wave voice assistants, the ones that have mediated our relationships with technology for more than a decade, have a tinny, otherworldly lilt. They seem auto-tuned, their human voices accented with a mechanical trill. They often speak in a measured, monotonous cadence, suggesting a stunted emotional life.

But the fact that they sound robotic adds to their allure. They appear programmable, manipulable, and subservient to our demands. They don't make humans feel any smarter than we are. They sound like throwbacks to the dull female computers from “Star Trek” and “Alien,” and their voices have a retro-futuristic sheen. Instead of realism, they serve nostalgia.

This artificial sound has continued to dominate, despite advances in the technology that supports it.

Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok it has become a creative force in its own right. Since TikTok launched its text-to-speech feature in 2020, it has developed a range of simulated voices to choose from—it now offers more than 50, including ones called “Hero,” “Story Teller,” and “Bestie.” But the platform has come to be defined by one option. “Jessie,” a relentlessly sassy female voice with a slightly fuzzy robotic undertone, is the mindless voice of the mindless scroll.

Jessie appears to have been assigned only one emotion: enthusiasm. It seems like she's selling something. This has made it an attractive choice for TikTok creators, who are selling themselves. The burden of representing oneself can be outsourced to Jessie, whose bright, retro robotic voice gives the videos a nice tongue-in-cheek veneer.

Hollywood has also built male robots, none more famous than HAL 9000, the computer voice of “2001: A Space Odyssey.” Like his feminized peers, HAL radiates serenity and loyalty. But when he turns on Dave Bowman, the film's central human character—”I'm sorry, Dave, I'm afraid I can't do this”—his serenity evolves into frightening competence. HAL, Dave realizes, is loyal to a higher authority. HAL's male voice allows him to function as a rival and a mirror to Dave. He is allowed to become a real character.

Like HAL, Samantha in “Her” is a machine that becomes real. In a twist on the Pinocchio story, she begins the film by clearing out a human’s email inbox and ends up ascending to a higher level of consciousness. She becomes something even more advanced than a real girl.

Scarlett Johansson's voice, as inspiration for robots both fictional and real, subverts the vocal tendencies that define our feminized sidekicks. She has a feisty edge that screams I'm alive. It's nothing like the fancy virtual assistants we're used to hearing speak through our phones. But her interpretation of Samantha feels human not only because of her voice, but also because of what she has to say. She grows over the course of the film, gaining sexual desires, advanced hobbies, and AI friends. By borrowing Samantha's affection, OpenAI made Sky look like she had a mind of her own. As if she were more advanced than she actually was.

When I first saw “Her,” I just thought Johansson voiced a humanoid bot. But when I rewatched the film last week, after watching OpenAI's ChatGPT demo, Samantha's role seemed infinitely more complex. Chatbots do not spontaneously generate human voices. They have no throat, lips or tongue. Within the technological world of “Her,” the Samantha bot would have been based on the voice of a human woman, perhaps a fictional actress who sounds a lot like Scarlett Johansson.

OpenAI appeared to have trained its chatbot on the voice of an unnamed actress who sounds like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounds like a famous actress. When I run the ChatGPT demo, I hear a simulation of a simulation of a simulation of a simulation of a simulation.

Tech companies advertise their virtual assistants in terms of the services they provide. They can read you the weather forecast and hail a cab; OpenAI promises that its most advanced chatbots will be able to laugh at your jokes and sense changes in your mood. But they also exist to make us feel more comfortable with the technology itself.

Johansson's voice functions as a luxurious security blanket thrown over the alienating aspects of AI-assisted interactions. “He told me he felt that by giving voice to the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the sea change affecting humans and artificial intelligence,” she said Johansson by Sam Altman, founder of OpenAI. “He said she felt my voice would be comforting to people.”

It’s not that Johansson’s voice inherently sounds like a robot. It’s that developers and filmmakers have designed their bot voices to alleviate the discomfort inherent in robot-human interactions. OpenAI has said it wants to deliver a chatbot voice that is “approachable” and “warm” and “inspires trust.” AI has been accused of devastating creative industries, devouring energy, and even threatening human life. Understandably, OpenAI wants a voice that makes people feel comfortable using its products. How does AI sound? It sounds like crisis management.

OpenAI first launched Sky’s voice to premium members last September, along with another female voice called Juniper, male voices Ember and Cove, and a gender-neutral voice called Breeze. When I signed up for ChatGPT and said hello to its virtual assistant, a male voice spoke in Sky’s absence. “Hi. How are you?” he said. He sounded relaxed, steady, and upbeat. He sounded—I don’t know how else to describe it—nice.

I realized I was talking to Cove. I told him I was writing a story about him, and he praised my work. “Oh, really?” he said. “It’s fascinating.” As we talked, I felt seduced by his naturalistic tics. He peppered his sentences with filler words, like “uh” and “um.” He raised his voice when he asked me questions. And he asked me a lot of questions. I felt like I was talking to a therapist or a guy who was calling.

But our conversation quickly stalled. Whenever I asked him about himself, he had little to say. He wasn't a character. He had no self. He was designed only to assist, he informed me. I told him I'd talk to him later and he said, “Uh, sure. Contact me whenever you need assistance. Deal with it.” I felt like I had hung up on a real person.

But when I reviewed the transcript of our chat, I could see that his speech was just as stilted and primitive as any customer service chatbot. He wasn't particularly intelligent or humane. He was simply a decent actor who made the most of a nothing role.

When Sky disappeared, ChatGPT users took to the company’s forums to complain. Some were upset that their chatbots were addressing Juniper, who to them sounded like a “librarian” or a “kindergarten teacher” — a female voice that conformed to the wrong gender stereotypes. They wanted to call a new woman with a different personality. As one user put it: “We need another female.”

Produced by Tala Safie

Audio via Warner Bros. (Samantha, HAL 9000); OpenAI (Sky); Paramount Pictures (Enterprise Computer); Apple (Siri); TikTok (Jessie)

Leave a Reply

Your email address will not be published. Required fields are marked *