
:max_bytes(150000):strip_icc()/005_use-google-text-to-speech-on-android-4767200-6353b765efc54f7c8b8b37074a23cea4.jpg)
You can get a list of all supported voices with describe-voices, which you’ll likely want to pass to jq: aws polly describe-voices | jq '.In the “rapid prototyping” lecture of the degree program Digital Healthcare at the St. The aws polly command contains all of the controls for working with Polly. We’ll cover the CLI here, but you can read the API documentation for Polly for reference on how to set that up. You’re far more likely to want to access programmatically using the AWS API or the CLI. Of course, using a service like this from the console isn’t that useful. If you’re converting more than 3,000 characters, you’ll have to save the input file to S3. You can download the file as an MP3 from here, or save it to S3. You can press the “Listen To Speech” button to preview the results: This service is extremely simple-just give Polly the text you want to convert, select a language, and select the voice you wish to use. To get started, head over to the Polly Console. You can also use Speech Synthesis Markup Language (SSML) as input, which gives fine control over the output. You can also provide Polly with custom lexicons, which enables you to change the pronunciation of certain words to customize the response you get, or fix errors with the text to speech engine. It’s still quite good, though not quite on the level of the neural engine. If you’re building a conversational application, the responses will usually be fairly short, which cuts down on cost.ĪWS Polly also supports standard TTS, which is four times cheaper and also used as a fallback for certain languages that don’t have neural support yet. The going rate for neural TTS is $16 per million characters of text. Like most AWS services, you’re charged based on usage. With Polly, robotic TTS is a thing of the past. Which one do you want to put in front of users?

Hear the difference? The transitions between words are much smoother than what can be achieved programmatically.

Now listen to this example using neural TTS. Listen to this example using standard TTS. We can’t overstate this enough, neural text-to-speech (TTS) sounds fluid and human, much like Siri or Alexa, and standard TTS sounds robotic in comparison (though, admittedly, still quite acceptable). Neural Based Text-to-Speech Is So Much Better If your application needs a way to convert text to speech programmatically to interact with users, AWS has a managed service that uses machine learning to create lifelike believable voices that improve your user experience significantly.
