Google upgrades its speech APIs with improved features
For many developers, the addition of 17 new WaveNet-based voices for a variety of languages will be the highlight of today’s update.
WaveNet is Google’s technology which uses machine learning to create a natural-sounding voice when performing text-to-speech.
Text-to-Speech now supports a total of 30 standard voices and 26 WaveNet voices across 14 languages. A demo of the new voices, using your own text, can be found here.
Among the new features is the addition of ‘audio profiles’ to customise the output for the speaker being used. For example, output for headphones, sound bars, or the phone’s built-in speaker will all sound best with custom tuning.
On the flip-side, Speech-to-Text has also received significant improvements.
The most impressive feature is the ability to recognise multiple speakers in a voice recording for automatic transcriptions. However, the number of speakers must be provided beforehand.
Along with the support for additional Text-to-Speech languages, Google is also supporting more for Speech-to-Text. After selecting up to four languages, the API will automatically determine which language is being spoken.
Finally, the addition of a ‘word confidence score’ helps to ensure accuracy.
With each query, the Speech-to-Text API will return a confidence score that it’s heard a word correctly before making it actionable. If a low confidence is returned, and it’s important to get it right, the developer may choose to prompt the user to repeat.
“For example, if a user inputs ‘please set up a meeting with John for tomorrow at 2PM’ into your app, you can decide to prompt the user to repeat ‘John’ or ‘2PM,’ if either have low confidence, but not to reprompt for ‘please’ even if has low confidence since it’s not critical to that particular sentence,” the team explains.
Considering the difficulty some voice recognition services have with my accent, that last feature could help to reduce awkward errors.
What are your thoughts on Google’s improved speech features? Let us know in the comments.
- » How Open Banking initiatives are creating new opportunities for developers
- » Pusher 'Beams' enables read receipts for push notifications
- » TomTom launches free mobile maps SDKs to developers
- » Android's latest OS distribution stats are in, and it's still a problem
- » Mozilla aims to expand WebVR's capabilities with the WebXR API