One of the things we tend to take for granted in our lives is the power of speech. How much actually gets conveyed when we have conversations with others, not just in the words, but through intonation, stress, and pitch. Predictably, one of the biggest challenges with translating text to speech in the world of computing was the ability for computers to speak and sound natural. Polly is Amazon’s effort into this world and it is pretty amazing what she can already do.
What is Amazon Polly?
Polly is a deep learning service hosted by Amazon that reads text and plays it back as audio.
Polly goes one step further in that it will read each sentence, analyze it, determine the best (most “natural” way to read it). For instance, if I were to type, “The temperature in WA is 75°F” Polly would read that as, “The temperature in Washington is seventy-five degrees Fahrenheit.” This is important in that it shows that Polly is contextually aware of the words when forming those sentences and reading them, making her responses custom-tailored to each situation.
In addition to making Polly contextually aware, Amazon integrated over 40 different voices/accents into her system in addition to over 20 different languages. This allows users to target specific accents, speech patterns, including word choices, tone, pitch, and stress to provide a more natural experience for each user. What they have been able to accomplish is truly amazing.
Text-to-Speech Phone Greetings
We see advanced voice technologies like Amazon Polly as an opportunity to differentiate the Telzio platform from other business phone systems. Our development team worked on integrating Polly into our infrastructure in addition to creating GUI workflows to seamlessly introduce features, intuitively.
We’ve implemented Polly into the Telzio platform in a number of area such as:
- Phone Menus,
- Hold Messages,
- Voicemail, and
- our future AI Attendant.
For example, any Phone Menu greeting you need to record, you can now type and have Polly read it for you. While you can also record a greeting through your browser or upload an audio file, the Text-to-Speech feature makes testing and setting up your phone system convenient. As a matter of fact, upon creating an account with Telzio, the system will create a personalized phone menu and greeting with your company name to get you started.
Start a free trial with Telzio to hear the custom company greeting we created for you with #AmazonPolly.
Who else is using Amazon Polly?
Amazon has made great strides in making Polly sound “natural” and opened Polly up to be utilized by numerous services who want to provide text-to-speech capability on their own services. For instance, Duolingo, a language learning platform and largest language app in the world, utilizing Polly to deliver their content to language learners around the world. Though to most, this would seem counterintuitive (after all, wouldn’t a native speaker be the best choice?), to Duolingo, Polly gave them the ability to create content and deliver it all around the world and as Polly improved all of their content saw the benefit of that improvement as opposed to recordings that would need to be re-recorded.
More recently, in the world of entertainment, Dan Brown, author of The Da Vinci Code, utilized Polly to launch a new novel, Origin. The #DanBrownOrigin campaign was touted as the world’s first virtual book signing which allowed you to cast a vote to select a cover design, then witness Dan Brown signing a copy of his new book for you, all after you, being greeted by name. Of course, Polly wasn’t the only one that shouldered this burden. The team behind the campaign also worked extensively with Dan Brown to create a font out of his handwriting, and create software to replicate the weight of the actual writing in the book to complete the illusion. Pretty astounding if you ask me.
Amazon is definitely bringing its A-game when it comes to leveraging AI and Deep Learning. I am sure we will see Polly do amazing things as time passes. Of course, the ever elusive goal of pure, indistinguishable natural speech still eludes us, we are definitely closer and it doesn’t seem quite as elusive anymore.
Lata is part of the marketing team at Telzio, and uses her in-depth understanding of business VoIP to contribute education about VoIP phone systems to the Telzio blog. Her experience includes creating technical documentation and guides on complex telecommunication systems for Avaya and IBM. Currently based in Barcelona, Lata has a Bachelor's degree in computer science and Master's in English.