Biotech & Health

WellSaid Labs research takes synthetic speech from seconds-long clips to hours

Comment

Sound waves (Digitally Generated)
Image Credits: Tommy Flynn / Getty Images

Millions of homes have voice-enabled devices, but when was the last time you heard a piece of synthesized speech longer than a handful of seconds? WellSaid Labs has pushed the field ahead with a voice engine that can easily and quickly generate hours of voice content that sounds just as good or better than the snippets we hear every day from Siri and Alexa.

The company has been working since its public debut last year to advance its tech from impressive demo to commercial product, and in the process found a lucrative niche that it can build from.

CTO Michael Petrochuk explained that early on, the company had essentially based its technology on prior research — Google’s Tacotron project, which established a new standard for realism in artificial speech.

“Despite being released two years ago, Tacotron 2 is still state of the art. But it has a couple issues,” explained Petrochuk. “One, it’s not fast — it takes three minutes to produce one second of audio. And it’s built to model 15 seconds of audio. Imagine that in a workflow where you’re generating 10 minutes of content — it’s orders of magnitude off where we want to be.”

Google’s Tacotron 2 simplifies the process of teaching an AI to speak

WellSaid completely rebuilt their model with a focus on speed, quality and length, which sounds like “focusing” on everything at once, but there are always plenty more parameters to optimize for. The result is a model that can generate extremely high-quality speech with any of 15 voices (and several languages) at about half real-time — so a minute-long clip would take about 36 seconds to generate instead of a couple hours.

This seemingly basic capability has plenty of benefits. Not only is it faster, but it makes working with the results simpler and easier. As a producer of audio content, you can just drop in a script hundreds of words long, listen to what it puts out, then tweak its pronunciation or cadence with a few keystrokes. Tacotron changed the synthetic speech space, but it has never really been a product. WellSaid builds on its advances with its own to create both a usable piece of software, and arguably a better speech system overall.

As evidence, clips generated by the model — 15-second ones, so they can compete with Tacotron and others — reached a milestone of being equally well-rated as human voices in tests organized by WellSaid. There’s no objective measure for this kind of thing, but asking lots of humans to weigh in on how human something sounds is a good place to start.

As part of the team’s work to achieve “human parity” under these conditions, they also released a number of audio clips demonstrating how the model can produce much more demanding content.

It generated plausible-sounding speech in Spanish, French and German (I’m not a native speaker of any of them, so can’t say more than that), showed off its facility with complex and linguistically difficult words (like stoichiometry and halogenation), words that differ depending on context (buffet, desert), and so on. The crowning achievement must be a continuous 8-hour reading of the entirety of Mary Shelley’s “Frankenstein.”

But audiobooks aren’t the industry that WellSaid is using as a stepladder to further advances. Instead, they’re making a bundle working in the tremendously boring but necessary field of corporate training. You know, the sorts of videos that explain policies, document the use of internal tools and explain best practices for sales, management, development tools and so on.

Corporate learning stuff is generally unique or at least tailored to each company, and can involve hours of audio — an alternative to saying, “Here, read this packet” or gathering everyone in a room to watch a decades-old DVD on office conduct. Not the most exciting place to put such a powerful technology to work, but the truth is with startups that no matter how transformative you think your tech is, if you don’t make any money, you’re sunk.

A screenshot of WellSaid Labs' synthetic speech interface.
Image Credits: WellSaid Labs

“We found a sweet spot in the corporate training field, but for product development it has helped us build these foundational elements for a bigger and greater space,” explained Head of Growth Martín Ramírez. “Voice is everywhere, but we have to be pragmatic about who we build for today. Eventually we’ll deliver the infrastructure where any voice can be created and distributed.”

At first that may look like expanding the corporate offerings slowly, in directions like other languages — WellSaid’s system doesn’t have English “baked in,” and given training data in other languages should perform equally well in them. So that’s an easy way forward. But other industries could use improved voice capability as well: podcasting, games, radio shows, advertising, governance.

One significant limitation to the company’s approach is that the system is meant to be operated by a person and used for, essentially, recording a virtual voice actor. This means it’s not useful to the groups for whom an improved synthetic voice is desirable — many people with disabilities that affect their own voice, blind people who use voice-based interfaces all day long, or even people traveling in a foreign country and using real-time translation tools.

“I see WellSaid servicing that use-case in the near future,” said Ramírez, though he and the others were careful not to make any promises. “But today, the way it’s built, we truly believe a human producer should be interacting with the engine, to render it at a natural human-parity level. The dynamic rendering scenario is approaching quite fast, and we want to be prepared for it, but we’re not ready to do it today.”

The company has “plenty of runway and customers” and is growing fast — so no need for funding just now, thank you, venture capital firms.

More TechCrunch

Hello and welcome back to TechCrunch Space. Is it just me, or is the news cycle only accelerating this summer?! Want to reach out with a tip? Email Aria at…

TechCrunch Space: Space cowboys

Apple Intelligence features are not available in the developer beta, which is out now.

Without Apple Intelligence, iOS 18 beta feels like a TV show that’s waiting for the finale

Apple released the public betas for its next generation of software on the iPhone, Mac, iPad and Apple Watch on Monday. You can now test out iOS 18 and many…

Apple’s public betas for iOS 18 are here to test out

One major dissenter threatens to upend Fisker’s apparent best chance at offloading its unsold EVs, a deal that would keep the startup’s bankruptcy proceeding alive and pave the way for…

Fisker has one major objector to its Ocean SUV fire sale

Payments giant Stripe has delayed going public for so long that its major investor Sequoia Capital is getting creative to offer returns to its limited partners. The venture firm emailed…

Major Stripe investor Sequoia confirms $70B valuation, offers its investors a payday

Alphabet, Google’s parent company, is in advanced talks to acquire Wiz for $23 billion, a person close to the company told TechCrunch. The deal discussions were previously reported by The…

Google’s Kurian approached Wiz, $23B deal could take a week to land, source says

Name That Bird determines individual members of a species by identifying distinguishing characteristics that most humans would be hard-pressed to spot.

Bird Buddy’s new AI feature lets people name and identify individual birds

YouTube Music is introducing two new ways to boost song discovery on its platform. YouTube announced on Monday that it’s experimenting with an AI-generated conversational radio feature, and rolling out…

YouTube Music is testing an AI-generated radio feature and adding a song recognition tool

Tesla had internally planned to build the dedicated robotaxi and the $25,000 car, often referred to as the Model 2, on the same platform.

Elon Musk confirms Tesla ‘robotaxi’ event delayed due to design change

What this means for the space industry is that theory has become reality: The possibility of designing a habitation within a lunar tunnel is a reasonable proposition.

Moon cave! Discovery could redirect lunar colony and startup plays

Get ready for a prime week of savings at TechCrunch Disrupt 2024 with the launch of Disrupt Deal Days! From now to July 19 at 11:59 p.m. PT, we’re going…

Disrupt Deal Days are here: Prime savings for TechCrunch Disrupt 2024!

Deezer is the latest music streaming app to introduce an AI playlist feature. The company announced on Monday that a select number of paid users will be able to create…

Deezer chases Spotify and Amazon Music with its own AI playlist generator

Real-time payments are becoming commonplace for individuals and businesses, but not yet for cross-border transactions. That’s what Caliza is hoping to change, starting with Latin America. Founded in 2021 by…

Caliza lands $8.5 million to bring real-time money transfers to Latin America using USDC

Adaptive is a platform that provides tools designed to simplify payments and accounting for general construction contractors.

Adaptive builds automation tools to speed up construction payments

When VanMoof declared bankruptcy last year, it left around 5,000 customers who had preordered e-bikes in the lurch. Now VanMoof is up and running under new management, and the company’s…

How VanMoof’s new owners plan to win over its old customers

Mitti Labs aims to transform rice farming in India and other South Asian markets by reducing methane emissions by 50% and water consumption by 30%.

Mitti Labs aims to make rice farming less harmful to the climate, starting in India

This is a guide on how to check whether someone compromised your online accounts.

How to tell if your online accounts have been hacked

There is a general consensus today that generative AI is going to transform business in a profound way, and companies and individuals who don’t get on board will be quickly…

The AI financial results paradox

Google’s parent company Alphabet might be on the verge of making its biggest acquisition ever. The Wall Street Journal reports that Alphabet is in advanced talks to acquire Wiz for…

Google reportedly in talks to acquire cloud security company Wiz for $23B

Featured Article

Hank Green reckons with the power — and the powerlessness — of the creator

Hank Green has had a while to think about how social media has changed us. He started making YouTube videos in 2007 with his brother, novelist John Green, at a time when the first iPhone was in development, Myspace was still relevant and Instagram didn’t exist. Seventeen years later, posting…

Hank Green reckons with the power — and the powerlessness — of the creator

Here is a timeline of Synapse’s troubles and the ongoing impact it is having on banking consumers. 

Synapse’s collapse has frozen nearly $160M from fintech users — here’s how it happened

Featured Article

Helixx wants to bring fast-food economics and Netflix pricing to EVs

When Helixx co-founder and CEO Steve Pegg looks at Daisy — the startup’s 3D-printed prototype delivery van — he sees a second chance. And he’s pulling inspiration from McDonald’s to get there.  The prototype, which made its global debut this week at the Goodwood Festival of Speed, is an interesting proof…

Helixx wants to bring fast-food economics and Netflix pricing to EVs

Featured Article

India clings to cheap feature phones as brands struggle to tap new smartphone buyers

India is struggling to get new smartphone buyers, as millions of Indians don’t go for an upgrade and continue to be on feature phones.

India clings to cheap feature phones as brands struggle to tap new smartphone buyers

Roboticists at The Faboratory at Yale University have developed a way for soft robots to replicate some of the more unsettling things that animals and insects can accomplish — say,…

Meet the soft robots that can amputate limbs and fuse with other robots

Featured Article

If you’re an AT&T customer, your data has likely been stolen

This week, AT&T confirmed it will begin notifying around 110 million AT&T customers about a data breach that allowed cybercriminals to steal the phone records of “nearly all” of its customers. The stolen data contains phone numbers and AT&T records of calls and text messages during a six-month period in…

If you’re an AT&T customer, your data has likely been stolen

In the first half of 2024 alone, more than $35.5 billion was invested into AI startups globally.

Here’s the full list of 28 US AI startups that have raised $100M or more in 2024

Whistleblowers have accused OpenAI of placing illegal restrictions on how employees can communicate with government regulators, according to a letter obtained by The Washington Post. Lawyers representing anonymous whistleblowers sent…

Whistleblowers accuse OpenAI of ‘illegally restrictive’ NDAs

Business email compromise attacks are on the rise. Here’s how you can stay ahead of the hackers.

How to protect your startup from email scams

Featured Article

What exactly is an AI agent?

Regardless of how they’re defined, the agents are for helping complete tasks in an automated way with as little human interaction as possible.

What exactly is an AI agent?

Meta announced former President Donald Trump’s Facebook and Instagram accounts will no longer be subject to heightened suspension penalties, according to an updated blog post on Friday. The company says…

Meta removes special restrictions for Trump’s account ahead of 2024 elections