At TEDWomen, I introduced the audience to VocaliD — a project aimed at designing personalized synthetic voices so that people with severe speech impairments can use a voice that fits their body and personality. This is a project that I’ve been working on for several years, along with my students and my collaborator, Dr. Tim Bunnell of Nemours AI DuPont Hospital for Children. Together, we have developed algorithms to build unique voices for those unable to speak without computer assistance.
Rupal Patel: Synthetic voices, as unique as fingerprints We have conducted experiments to iteratively improve our techniques, which rely on combining the recipient’s vocal identity features with the speech clarity features of a matched voice donor. In early 2013, we reconnected with a young woman named Samantha, whom I had met years before, when she was 9. We had been painstakingly working toward the perfect—at least by scientific measures—voice for her and didn’t want to share it until it was absolutely ready. But through working with Samantha, we’ve come to understand that what she wanted wasn’t a perfect voice … she just wanted her voice.
At the end of my talk at TEDWomen, I invited the audience (both in San Francisco and around the world) to visit our website, VocaliD.ai, where you can not only request a personalized voice, but also find out more about how to donate your own voice. The response was overwhelming. We have received 50+ requests for voices and nearly 400 people have signed up to help — and those numbers continue to grow.
To meet the demand, we’re working hard to raise funds and build the infrastructure to gather and store all the donor voices. We are calling this effort The Human Voicebank Initiative. Our goal is to collect one million voice samples by 2020 to create the world’s largest repository of voices. This corpus would allow us to generate unique vocal identities for hundreds of recipients for whom we do not yet have matching donors. Recipients like Troy, 24, who uses the same voice as Stephen Hawking; Maryam, 19, who refuses to use a device because it does not sound like her; Sylvia, 53, whose voice is no longer supported on her new device; and Dale, who at just 5 years old has tried several voices but is still hoping to find his own.
Until now, voice donors have had to visit my laboratory — or Dr. Bunnell’s — to record 2-3 hours of speech (around 3,200 sentences) in a professional sound studio. This rather complicated process allows us to collect high-fidelity audio that we can use to create a high-quality voice. The drawback, however, is that it limits our ability to reach a vast audience, which is critical to making a real difference for the hundreds of people already waiting for voices and for the even larger set of people who may want voices in the future. These people aren’t limited to a single age group or a particular educational or technical background. They are as diverse as humanity itself, and thus we need a similarly broad group of donors. We cannot achieve that with the sporadic, resource-intensive visits to the lab that we now rely on.
But replacing our model will be tricky and will require creative solutions. The software we use runs on desktop computers, and it’s not particularly engaging. Instead, we envision an alternative that can run on tablets and mobile phones, which have surprisingly good microphones. To engage children and less technical donors, we need to design a fun, simple game that will capture their attention and make them want to play again and again.
We are committed to advancing our initiative and we need your help. Without you, we simply can’t give voices to the people who need them—let alone perfect ones.
If you’re interested in donating your own voice, you probably have some questions. Here’s what you need to know:
Q: What do I need to do?
A: You need to be able to read or repeat short sentences that, together, cover all the combinations of sounds that occur in our language. The more of your speech we have, the better a voice we can create.
Q: How long does it take?
A: We need about 2-3 hours of speech from each donor. (Though even an hour of speech can go a long way.) You don’t have to do this all at once. You can take your time and break it up into small sessions of around 15-20 minutes, so that you can record your best voice. That’s why we need a simple website or app — so you can record whenever you want. All we’d ask is that you record in a quiet place. The better your recordings, the better the voice we can create.
Q: Do I need to sound like a radio announcer?
A: No. We want and need all types of voices. Each person has a unique voice, which can help this project in its own way.
Q: Will others recognize me in someone’s voice?
A: The new voice will have elements of your voice blended with the recipient’s voice, so it is possible, but very unlikely that others will recognize you — unless of course you have a famous or well-known voice ;)
Q: Why should I do this?
There are so many reasons! First of all, you can help give someone a voice — that’s powerful. But in the process, you can also learn something about your own voice just by banking it. Most of us rarely give our voice much thought, but the process of recording can be made educational and reflective. In fact, for K-12 donors, we hope to develop a curriculum that will supplement the voice donation process.
But really, there are more reasons than this. If you bank your voice, it may be possible to re-create it should you ever lose it in the future. And your voice may help researchers learn more about the human voice in general. Finally, it could not only help us determine better ways to design synthetic voices, but also ways to apply our knowledge to improve health diagnostics, bioengineering, and other related fields.
Q: When can I start?
A: As I mentioned, we are working hard to raise funds and create a team to launch this exciting effort. If you want to be part of The Human Voicebank Initiative, please visit www.vocaliD.ai and sign up to donate your voice, time, expertise, or financial support.