Building the Future of Voice AI Without the Dystopia of “Her”
Alexis Conneau can’t shake his fascination with the movie Her. For years, he’s been driven to replicate the film’s advanced AI voice assistant, Samantha. His fixation is clear: his Twitter banner features a picture of Joaquin Phoenix’s character from the film.
Conneau’s work at OpenAI, where he spearheaded the development of ChatGPT’s Advanced Voice Mode, brought him remarkably close to achieving his goal. Unlike traditional voice assistants, this AI can process and respond to speech with human-like fluidity. But now, with his new startup, WaveForms AI, he wants to take voice technology even further — without the dystopian consequences.
In an interview with TechCrunch, Conneau explained his aim to sidestep the grim vision of Her. The film depicts a world where people form intimate bonds with AI systems at the expense of real human connections. “It’s a dystopia, right? It’s not the future we want,” Conneau said. “We want to bring this technology to the world in a way that benefits people. We want to avoid the pitfalls shown in that movie.”
On Monday, Conneau officially launched WaveForms AI, an audio-centric AI startup developing its own large language models (LLMs). The company plans to roll out products by 2025 that will rival AI audio offerings from OpenAI and Google. Backed by $40 million in seed funding from Andreessen Horowitz, the venture has the support of Marc Andreessen, a vocal advocate for integrating AI into daily life.
Interestingly, Conneau’s Her-inspired vision has previously caused friction. Earlier this year, Scarlett Johansson threatened legal action against OpenAI over a ChatGPT voice that closely resembled her character’s in Her. OpenAI ultimately pulled the voice, though they denied intentional replication.
The once futuristic scenario presented by Her no longer seems far-fetched. AI platforms like Character.AI already draw millions of users who chat with AI companions. Conneau, however, is cautious about the AI companionship market and intends for WaveForms AI to have a broader, more versatile approach. He envisions users engaging with AI in productive ways — perhaps having a 20-minute chat with their car’s AI to learn something new during a drive.
WaveForms AI aims to make technology more “emotionally intelligent,” facilitating natural interactions with devices like cars and computers. “We’re not trying to replace human-to-human interaction,” Conneau emphasized. “AI should complement human relationships, not compete with them.”
He also wants to learn from the mistakes of social media. Instead of encouraging addictive behaviors or maximizing time spent on the platform, WaveForms AI will prioritize user well-being. Conneau believes this ethical approach to AI is essential: “It’s the most important work you could do.”
Conneau says OpenAI’s “Advanced Voice Mode” name doesn’t fully capture the leap in technology. Unlike the older voice mode — which converted speech to text, processed it, and then converted it back — Advanced Voice Mode uses GPT-4o to process speech directly. Audio is broken down into tokens and run through a specialized transformer model, resulting in quicker and more natural interactions.
With WaveForms AI, Conneau aims to deliver voice technology that feels personal, intelligent, and — crucially — aligned with human interests. The challenge? Achieving all that without replicating Her’s unsettling future.