Text-to-speech API reference (REST) - Speech service - Azure Cognitive Services (2023)

  • Article
  • 12 minutes to read

The Speech service allows you to convert text into synthesized speech and get a list of supported voices for a region by using a REST API. In this article, you'll learn about authorization options, query options, how to structure a request, and how to interpret a response.

Tip

Use cases for the text-to-speech REST API are limited. Use it only in cases where you can't use the Speech SDK. For example, with the Speech SDK you can subscribe to events for more insights about the text-to-speech processing and results.

The text-to-speech REST API supports neural text-to-speech voices, which support specific languages and dialects that are identified by locale. Each available endpoint is associated with a region. A Speech resource key for the endpoint or region that you plan to use is required. Here are links to more information:

  • For a complete list of voices, see Language and voice support for the Speech service.
  • For information about regional availability, see Speech service supported regions.
  • For Azure Government and Azure China endpoints, see this article about sovereign clouds.

Important

Costs vary for prebuilt neural voices (called Neural on the pricing page) and custom neural voices (called Custom Neural on the pricing page). For more information, see Speech service pricing.

Before you use the text-to-speech REST API, understand that you need to complete a token exchange as part of authentication to access the service. For more information, see Authentication.

(Video) Microsoft Cognitive Services: Text to Speech API

Get a list of voices

You can use the tts.speech.microsoft.com/cognitiveservices/voices/list endpoint to get a full list of voices for a specific region or endpoint. Prefix the voices list endpoint with a region to get a list of voices for that region. For example, to get a list of voices for the westus region, use the https://westus.tts.speech.microsoft.com/cognitiveservices/voices/list endpoint. For a list of all supported regions, see the regions documentation.

Note

Voices and styles in preview are only available in three service regions: East US, West Europe, and Southeast Asia.

Request headers

This table lists required and optional headers for text-to-speech requests:

HeaderDescriptionRequired or optional
Ocp-Apim-Subscription-KeyYour Speech resource key.Either this header or Authorization is required.
AuthorizationAn authorization token preceded by the word Bearer. For more information, see Authentication.Either this header or Ocp-Apim-Subscription-Key is required.

Request body

A body isn't required for GET requests to this endpoint.

Sample request

This request requires only an authorization header:

GET /cognitiveservices/voices/list HTTP/1.1Host: westus.tts.speech.microsoft.comOcp-Apim-Subscription-Key: YOUR_RESOURCE_KEY

Here's an example curl command:

curl --location --request GET 'https://YOUR_RESOURCE_REGION.tts.speech.microsoft.com/cognitiveservices/voices/list' \--header 'Ocp-Apim-Subscription-Key: YOUR_RESOURCE_KEY'

Sample response

You should receive a response with a JSON body that includes all supported locales, voices, gender, styles, and other details. This JSON example shows partial results to illustrate the structure of a response:

[ // Redacted for brevity { "Name": "Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)", "DisplayName": "Jenny", "LocalName": "Jenny", "ShortName": "en-US-JennyNeural", "Gender": "Female", "Locale": "en-US", "LocaleName": "English (United States)", "StyleList": [ "assistant", "chat", "customerservice", "newscast", "angry", "cheerful", "sad", "excited", "friendly", "terrified", "shouting", "unfriendly", "whispering", "hopeful" ], "SampleRateHertz": "24000", "VoiceType": "Neural", "Status": "GA", "ExtendedPropertyMap": { "IsHighQuality48K": "True" }, "WordsPerMinute": "152" }, // Redacted for brevity { "Name": "Microsoft Server Speech Text to Speech Voice (en-US, JennyMultilingualNeural)", "DisplayName": "Jenny Multilingual", "LocalName": "Jenny Multilingual", "ShortName": "en-US-JennyMultilingualNeural", "Gender": "Female", "Locale": "en-US", "LocaleName": "English (United States)", "SecondaryLocaleList": [ "de-DE", "en-AU", "en-CA", "en-GB", "es-ES", "es-MX", "fr-CA", "fr-FR", "it-IT", "ja-JP", "ko-KR", "pt-BR", "zh-CN" ], "SampleRateHertz": "24000", "VoiceType": "Neural", "Status": "GA", "WordsPerMinute": "190" }, // Redacted for brevity { "Name": "Microsoft Server Speech Text to Speech Voice (ga-IE, OrlaNeural)", "DisplayName": "Orla", "LocalName": "Orla", "ShortName": "ga-IE-OrlaNeural", "Gender": "Female", "Locale": "ga-IE", "LocaleName": "Irish (Ireland)", "SampleRateHertz": "24000", "VoiceType": "Neural", "Status": "GA", "WordsPerMinute": "139" }, // Redacted for brevity { "Name": "Microsoft Server Speech Text to Speech Voice (zh-CN, YunxiNeural)", "DisplayName": "Yunxi", "LocalName": "云希", "ShortName": "zh-CN-YunxiNeural", "Gender": "Male", "Locale": "zh-CN", "LocaleName": "Chinese (Mandarin, Simplified)", "StyleList": [ "narration-relaxed", "embarrassed", "fearful", "cheerful", "disgruntled", "serious", "angry", "sad", "depressed", "chat", "assistant", "newscast" ], "SampleRateHertz": "24000", "VoiceType": "Neural", "Status": "GA", "RolePlayList": [ "Narrator", "YoungAdultMale", "Boy" ], "WordsPerMinute": "293" }, // Redacted for brevity]

HTTP status codes

The HTTP status code for each response indicates success or common errors.

HTTP status codeDescriptionPossible reason
200OKThe request was successful.
400Bad requestA required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common reason is a header that's too long.
401UnauthorizedThe request is not authorized. Make sure your resource key or token is valid and in the correct region.
429Too many requestsYou have exceeded the quota or rate of requests allowed for your resource.
502Bad gatewayThere's a network or server-side problem. This status might also indicate invalid headers.

Convert text to speech

The cognitiveservices/v1 endpoint allows you to convert text to speech by using Speech Synthesis Markup Language (SSML).

(Video) Explore text-to-speech capabilities with Azure Cognitive Services | Azure Developer Streams

Regions and endpoints

These regions are supported for text-to-speech through the REST API. Be sure to select the endpoint that matches your Speech resource region.

Prebuilt neural voices

Use this table to determine availability of neural voices by region or endpoint:

RegionEndpoint
Australia Easthttps://australiaeast.tts.speech.microsoft.com/cognitiveservices/v1
Brazil Southhttps://brazilsouth.tts.speech.microsoft.com/cognitiveservices/v1
Canada Centralhttps://canadacentral.tts.speech.microsoft.com/cognitiveservices/v1
Central UShttps://centralus.tts.speech.microsoft.com/cognitiveservices/v1
East Asiahttps://eastasia.tts.speech.microsoft.com/cognitiveservices/v1
East UShttps://eastus.tts.speech.microsoft.com/cognitiveservices/v1
East US 2https://eastus2.tts.speech.microsoft.com/cognitiveservices/v1
France Centralhttps://francecentral.tts.speech.microsoft.com/cognitiveservices/v1
Germany West Centralhttps://germanywestcentral.tts.speech.microsoft.com/cognitiveservices/v1
India Centralhttps://centralindia.tts.speech.microsoft.com/cognitiveservices/v1
Japan Easthttps://japaneast.tts.speech.microsoft.com/cognitiveservices/v1
Japan Westhttps://japanwest.tts.speech.microsoft.com/cognitiveservices/v1
Jio India Westhttps://jioindiawest.tts.speech.microsoft.com/cognitiveservices/v1
Korea Centralhttps://koreacentral.tts.speech.microsoft.com/cognitiveservices/v1
North Central UShttps://northcentralus.tts.speech.microsoft.com/cognitiveservices/v1
North Europehttps://northeurope.tts.speech.microsoft.com/cognitiveservices/v1
Norway Easthttps://norwayeast.tts.speech.microsoft.com/cognitiveservices/v1
South Central UShttps://southcentralus.tts.speech.microsoft.com/cognitiveservices/v1
Southeast Asiahttps://southeastasia.tts.speech.microsoft.com/cognitiveservices/v1
Sweden Centralhttps://swedencentral.tts.speech.microsoft.com/cognitiveservices/v1
Switzerland Northhttps://switzerlandnorth.tts.speech.microsoft.com/cognitiveservices/v1
Switzerland Westhttps://switzerlandwest.tts.speech.microsoft.com/cognitiveservices/v1
UAE Northhttps://uaenorth.tts.speech.microsoft.com/cognitiveservices/v1
US Gov Arizonahttps://usgovarizona.tts.speech.azure.us/cognitiveservices/v1
US Gov Virginiahttps://usgovvirginia.tts.speech.azure.us/cognitiveservices/v1
UK Southhttps://uksouth.tts.speech.microsoft.com/cognitiveservices/v1
West Central UShttps://westcentralus.tts.speech.microsoft.com/cognitiveservices/v1
West Europehttps://westeurope.tts.speech.microsoft.com/cognitiveservices/v1
West UShttps://westus.tts.speech.microsoft.com/cognitiveservices/v1
West US 2https://westus2.tts.speech.microsoft.com/cognitiveservices/v1
West US 3https://westus3.tts.speech.microsoft.com/cognitiveservices/v1

Tip

Voices in preview are available in only these three regions: East US, West Europe, and Southeast Asia.

Custom neural voices

If you've created a custom neural voice font, use the endpoint that you've created. You can also use the following endpoints. Replace {deploymentId} with the deployment ID for your neural voice model.

RegionTrainingDeploymentEndpoint
Australia EastYesYeshttps://australiaeast.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Brazil SouthNoYeshttps://brazilsouth.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Canada CentralNoYeshttps://canadacentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Central USNoYeshttps://centralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
East AsiaNoYeshttps://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
East USYesYeshttps://eastus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
East US 2YesYeshttps://eastus2.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
France CentralNoYeshttps://francecentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Germany West CentralNoYeshttps://germanywestcentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
India CentralYesYeshttps://centralindia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Japan EastYesYeshttps://japaneast.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Japan WestNoYeshttps://japanwest.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Jio India WestNoYeshttps://jioindiawest.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Korea CentralYesYeshttps://koreacentral.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
North Central USNoYeshttps://northcentralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
North EuropeYesYeshttps://northeurope.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Norway EastNoYeshttps://norwayeast.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
South Africa NorthNoYeshttps://southafricanorth.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
South Central USYesYeshttps://southcentralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Southeast AsiaYesYeshttps://southeastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Switzerland NorthNoYeshttps://switzerlandnorth.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
Switzerland WestNoYeshttps://switzerlandwest.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
UAE NorthNoYeshttps://uaenorth.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
UK SouthYesYeshttps://uksouth.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West Central USNoYeshttps://westcentralus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West EuropeYesYeshttps://westeurope.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West USYesYeshttps://westus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West US 2YesYeshttps://westus2.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}
West US 3NoYeshttps://westus3.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}

Note

The preceding regions are available for neural voice model hosting and real-time synthesis. Custom neural voice training is only available in some regions. But users can easily copy a neural voice model from these regions to other regions in the preceding list.

Long Audio API

The Long Audio API is available in multiple regions with unique endpoints:

RegionEndpoint
Australia Easthttps://australiaeast.customvoice.api.speech.microsoft.com
East UShttps://eastus.customvoice.api.speech.microsoft.com
India Centralhttps://centralindia.customvoice.api.speech.microsoft.com
South Central UShttps://southcentralus.customvoice.api.speech.microsoft.com
Southeast Asiahttps://southeastasia.customvoice.api.speech.microsoft.com
UK Southhttps://uksouth.customvoice.api.speech.microsoft.com
West Europehttps://westeurope.customvoice.api.speech.microsoft.com

Request headers

This table lists required and optional headers for text-to-speech requests:

(Video) How to convert Text Into Speech(Audio) using REST API

HeaderDescriptionRequired or optional
AuthorizationAn authorization token preceded by the word Bearer. For more information, see Authentication.Required
Content-TypeSpecifies the content type for the provided text. Accepted value: application/ssml+xml.Required
X-Microsoft-OutputFormatSpecifies the audio output format. For a complete list of accepted values, see Audio outputs.Required
User-AgentThe application name. The provided value must be fewer than 255 characters.Required

Request body

If you're using a custom neural voice, the body of a request can be sent as plain text (ASCII or UTF-8). Otherwise, the body of each POST request is sent as SSML. SSML allows you to choose the voice and language of the synthesized speech that the text-to-speech feature returns. For a complete list of supported voices, see Language and voice support for the Speech service.

Sample request

This HTTP request uses SSML to specify the voice and language. If the body length is long, and the resulting audio exceeds 10 minutes, it's truncated to 10 minutes. In other words, the audio length can't exceed 10 minutes.

POST /cognitiveservices/v1 HTTP/1.1X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcmContent-Type: application/ssml+xmlHost: westus.tts.speech.microsoft.comContent-Length: <Length>Authorization: Bearer [Base64 access_token]User-Agent: <Your application name><speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Male' name='en-US-ChristopherNeural'> Microsoft Speech Service Text-to-Speech API</voice></speak>

* For the Content-Length, you should use your own content length. In most cases, this value is calculated automatically.

HTTP status codes

The HTTP status code for each response indicates success or common errors:

HTTP status codeDescriptionPossible reason
200OKThe request was successful. The response body is an audio file.
400Bad requestA required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common reason is a header that's too long.
401UnauthorizedThe request is not authorized. Make sure your Speech resource key or token is valid and in the correct region.
415Unsupported media typeIt's possible that the wrong Content-Type value was provided. Content-Type should be set to application/ssml+xml.
429Too many requestsYou have exceeded the quota or rate of requests allowed for your resource.
502Bad gatewayThere's a network or server-side problem. This status might also indicate invalid headers.

If the HTTP status is 200 OK, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.

Audio outputs

The supported streaming and non-streaming audio formats are sent in each request as the X-Microsoft-OutputFormat header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Each prebuilt neural voice model is available at 24kHz and high-fidelity 48kHz.

  • Streaming
  • NonStreaming
amr-wb-16000hzaudio-16khz-16bit-32kbps-mono-opusaudio-16khz-32kbitrate-mono-mp3audio-16khz-64kbitrate-mono-mp3audio-16khz-128kbitrate-mono-mp3audio-24khz-16bit-24kbps-mono-opusaudio-24khz-16bit-48kbps-mono-opusaudio-24khz-48kbitrate-mono-mp3audio-24khz-96kbitrate-mono-mp3audio-24khz-160kbitrate-mono-mp3audio-48khz-96kbitrate-mono-mp3audio-48khz-192kbitrate-mono-mp3ogg-16khz-16bit-mono-opusogg-24khz-16bit-mono-opusogg-48khz-16bit-mono-opusraw-8khz-8bit-mono-alawraw-8khz-8bit-mono-mulawraw-8khz-16bit-mono-pcmraw-16khz-16bit-mono-pcmraw-16khz-16bit-mono-truesilkraw-22050hz-16bit-mono-pcmraw-24khz-16bit-mono-pcmraw-24khz-16bit-mono-truesilkraw-44100hz-16bit-mono-pcmraw-48khz-16bit-mono-pcmwebm-16khz-16bit-mono-opuswebm-24khz-16bit-24kbps-mono-opuswebm-24khz-16bit-mono-opus

Note

If you select 48kHz output format, the high-fidelity voice model with 48kHz will be invoked accordingly. The sample rates other than 24kHz and 48kHz can be obtained through upsampling or downsampling when synthesizing, for example, 44.1kHz is downsampled from 48kHz.

If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the ogg-24khz-16bit-mono-opus format by using the Opus codec.

(Video) Using Azure Cognitive Services - Text To Voice

Authentication

Each request requires an authorization header. This table illustrates which headers are supported for each feature:

Supported authorization headerSpeech-to-textText-to-speech
Ocp-Apim-Subscription-KeyYesYes
Authorization: BearerYesYes

When you're using the Ocp-Apim-Subscription-Key header, you're only required to provide your resource key. For example:

'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'

When you're using the Authorization: Bearer header, you're required to make a request to the issueToken endpoint. In this request, you exchange your resource key for an access token that's valid for 10 minutes.

How to get an access token

To get an access token, you need to make a request to the issueToken endpoint by using Ocp-Apim-Subscription-Key and your resource key.

The issueToken endpoint has this format:

https://<REGION_IDENTIFIER>.api.cognitive.microsoft.com/sts/v1.0/issueToken

Replace <REGION_IDENTIFIER> with the identifier that matches the region of your subscription.

Use the following samples to create your access token request.

HTTP sample

This example is a simple HTTP request to get a token. Replace YOUR_SUBSCRIPTION_KEY with your resource key for the Speech service. If your subscription isn't in the West US region, replace the Host header with your region's host name.

POST /sts/v1.0/issueToken HTTP/1.1Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEYHost: eastus.api.cognitive.microsoft.comContent-type: application/x-www-form-urlencodedContent-Length: 0

The body of the response contains the access token in JSON Web Token (JWT) format.

PowerShell sample

This example is a simple PowerShell script to get an access token. Replace YOUR_SUBSCRIPTION_KEY with your resource key for the Speech service. Make sure to use the correct endpoint for the region that matches your subscription. This example is currently set to West US.

$FetchTokenHeader = @{ 'Content-type'='application/x-www-form-urlencoded'; 'Content-Length'= '0'; 'Ocp-Apim-Subscription-Key' = 'YOUR_SUBSCRIPTION_KEY'}$OAuthToken = Invoke-RestMethod -Method POST -Uri https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken -Headers $FetchTokenHeader# show the token received$OAuthToken

cURL sample

cURL is a command-line tool available in Linux (and in the Windows Subsystem for Linux). This cURL command illustrates how to get an access token. Replace YOUR_SUBSCRIPTION_KEY with your resource key for the Speech service. Make sure to use the correct endpoint for the region that matches your subscription. This example is currently set to West US.

(Video) Getting Started with Azure Speech Services - Convert Speech to Text

curl -v -X POST \ "https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken" \ -H "Content-type: application/x-www-form-urlencoded" \ -H "Content-Length: 0" \ -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY"

C# sample

This C# class illustrates how to get an access token. Pass your resource key for the Speech service when you instantiate the class. If your subscription isn't in the West US region, change the value of FetchTokenUri to match the region for your subscription.

public class Authentication{ public static readonly string FetchTokenUri = "https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken"; private string subscriptionKey; private string token; public Authentication(string subscriptionKey) { this.subscriptionKey = subscriptionKey; this.token = FetchTokenAsync(FetchTokenUri, subscriptionKey).Result; } public string GetAccessToken() { return this.token; } private async Task<string> FetchTokenAsync(string fetchUri, string subscriptionKey) { using (var client = new HttpClient()) { client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey); UriBuilder uriBuilder = new UriBuilder(fetchUri); var result = await client.PostAsync(uriBuilder.Uri.AbsoluteUri, null); Console.WriteLine("Token Uri: {0}", uriBuilder.Uri.AbsoluteUri); return await result.Content.ReadAsStringAsync(); } }}

Python sample

# Request module must be installed.# Run pip install requests if necessary.import requestssubscription_key = 'REPLACE_WITH_YOUR_KEY'def get_token(subscription_key): fetch_token_url = 'https://eastus.api.cognitive.microsoft.com/sts/v1.0/issueToken' headers = { 'Ocp-Apim-Subscription-Key': subscription_key } response = requests.post(fetch_token_url, headers=headers) access_token = str(response.text) print(access_token)

How to use an access token

The access token should be sent to the service as the Authorization: Bearer <TOKEN> header. Each access token is valid for 10 minutes. You can get a new token at any time, but to minimize network traffic and latency, we recommend using the same token for nine minutes.

Here's a sample HTTP request to the speech-to-text REST API for short audio:

POST /cognitiveservices/v1 HTTP/1.1Authorization: Bearer YOUR_ACCESS_TOKENHost: westus.stt.speech.microsoft.comContent-type: application/ssml+xmlContent-Length: 199Connection: Keep-Alive// Message body here...

Next steps

  • Create a free Azure account
  • Get started with custom neural voice
  • Batch synthesis

FAQs

What does the speech to text aspect of the Speech service transcribe ______ Into_________? ›

The Azure speech-to-text service analyzes audio in real-time or batch to transcribe the spoken word into text.

Which of the following is something that Azure cognitive services API can currently? ›

Azure Cognitive Services

Transcribe audible speech into readable, searchable text. Convert text to lifelike speech for more natural interfaces. Integrate real-time speech translation into your apps. Identify and verify the people speaking based on audio.

What Azure service provides speech to text capability? ›

The Speech service provides speech-to-text and text-to-speech capabilities with an Azure Speech resource. You can transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and use speaker recognition during conversations.

Which TTS API is most realistic? ›

The most realistic TTS voice is Amazon Polly's neural voice option. It's the most popular choice for many businesses, and is incredibly difficult to tell apart from a human voice.

What is the difference between voice recognition and speech recognition? ›

Essentially, voice recognition is recognising the voice of the speaker whilst speech recognition is recognising the words said. This is important as they both fulfil different roles in technology.

What are the different types of speech recognition? ›

There are two types of speech recognition. One is called speaker–dependent and the other is speaker–independent. Speaker–dependent software is commonly used for dictation software, while speaker–independent software is more commonly found in telephone applications.

What are the 5 cognitive services? ›

Azure Cognitive Services enables developers to easily add cognitive features into their applications with cognitive solutions that can see, hear, speak, and analyze.
  • Categories of Cognitive Services. ...
  • Vision APIs. ...
  • Speech APIs. ...
  • Language APIs. ...
  • Decision APIs. ...
  • Create a Cognitive Services resource.
Nov 7, 2022

What are the 5 categories of cognitive services? ›

Note that there are five categories of services (vision, speech, language, knowledge, and search), each of them containing tailored applications.

What are the 3 important services offered by Azure? ›

In addition, Azure offers four different forms of cloud computing: infrastructure as a service (IaaS), platform as a service (PaaS), software as a service (SaaS) and serverless functions.

How do I use Azure speech-to-text? ›

Option 1: Out of the box Speech-to-text Service
  1. Sign in to Speech Studio with your Azure account.
  2. Select the speech service resource you need to get started.
  3. Select Real-time Speech-to-text.
  4. Select language of speech or set to auto detect.

What is speech-to-text API? ›

A Speech-to-Text API synchronous recognition request is the simplest method for performing recognition on speech audio data. Speech-to-Text can process up to 1 minute of speech audio data sent in a synchronous request. After Speech-to-Text processes and recognizes all of the audio, it returns a response.

Can Azure send SMS messages? ›

Azure Communication Services enables you to send and receive SMS text messages using the Communication Services SMS SDKs. These SDKs can be used to support customer service scenarios, appointment reminders, two-factor authentication, and other real-time communication needs.

What is the most convincing text-to-speech? ›

NaturalReader, Speechify, and Amazon Polly have the most lifelike human-sounding voices of all text-to-speech applications. Polly's Neural Text-to-Speech (NTTS) makes it a leading choice, with Speechify coming in close behind.

What is the most realistic text-to-speech free? ›

Speechify is the #1 text-to-speech program that turns any written text into spoken words in natural-sounding language. We have both free and premium subscriptions and over 150,000 5-star reviews. You can use our text editor, our Google Chrome Extension, our iOS app, our Mac Desktop app, or our Android app.

What is the most accurate TTS? ›

List of Top Text to Speech Software
  • Murf.
  • Speechify.
  • Speechelo.
  • Synthesys.
  • Nuance Dragon.
  • Notevibes.
  • NaturalReader.
  • Linguatec Voice Reader.
Jan 14, 2023

Which algorithm is best for speech recognition? ›

Traditional ASR algorithms

Hidden Markov models (HMM) and dynamic time warping (DTW) are two such examples of traditional statistical techniques for performing speech recognition.

What are the four 4 basic types of speech? ›

The four basic types of speeches are: to inform, to instruct, to entertain, and to persuade.

What are the 4 methods of speech? ›

In technical communication, there are four different types of speech delivery, each with their advantages and disadvantages. They are: impromptu , manuscript , memorized , and extemporaneous .

What are the 5 types of speech? ›

A speech usually has an introduction, main points, and a conclusion. There are a variety of different types of speeches that can be given in any situation, but the five most common types are informative, persuasive, demonstrative, entertaining, and special occasion.

What are the 4 main cognitive functions? ›

According to Jung's theory, people display four primary cognitive functions—Sensing, Intuition, Thinking, and Feeling—with either extroverted (or extraverted) or introverted tendencies.

What are the 7 cognitive processes? ›

Cognitive processes may include attention, perception, reasoning, emoting, learning, synthesizing, rearrangement and manipulation of stored information, memory storage, retrieval, and metacognition.

What are the 4 main features of the cognitive approach? ›

Key features of the cognitive approach are: A belief that psychology should be a pure science, and research methods should be scientific in nature. The primary interest is in thinking and related mental processes such as memory, forgetting, perception, attention and language.

What are the 6 cognitive Processes? ›

There are six levels of cognitive learning according to the revised version of Bloom's Taxonomy. Each level is conceptually different. The six levels are remembering, understanding, applying, analyzing, evaluating, and creating.

What are the six 6 types of cognitive domains *? ›

  • I. Knowledge. Remembering information.
  • II. Comprehension. Explaining the meaning of information.
  • III. Application. Using abstractions in concrete situations.
  • IV. Analysis. Breaking down a whole into component parts.
  • V. Synthesis. Putting parts together to form a new and integrated whole.
  • VI. Evaluation.
Feb 2, 2017

What are the 6 cognitive skills? ›

Bloom's taxonomy describes six cognitive categories: Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation.

What is speech-to-text transcription? ›

Speech to text is a speech recognition software that enables the recognition and translation of spoken language into text through computational linguistics. It is also known as speech recognition or computer speech recognition.

What is text to text transcription? ›

Nov 04, 2021. Transcription is the process of converting audio or video content into text format. Transcripts can be presented as a separate document as a text file, word document, PDF, or a web page or shown under the audio or the video file. Transcripts can be translated and presented in multiple languages.

Can use the Speech service to transcribe a call to text? ›

You can use the Speech service to transcribe a call to text. You can use the Text Analytics service to extract key entities from a call transcript. You can use the Speech service to translate the audio of a call to a different language.

What is speech synthesis for a given word in an input text? ›

Speech synthesis is artificial simulation of human speech with by a computer or other device. The counterpart of the voice recognition, speech synthesis is mostly used for translating text information into audio information and in applications such as voice-enabled services and mobile applications.

Videos

1. Azure Speech Service (Text to Speech and Speech to Text) with Blazor Server App
(Biswa Ranjan)
2. Azure Cognitive Services Leveraging on use of Speech to Text Text to Speech Services
(STYAVA )
3. How to get started with neural text to speech in Azure | Azure Tips and Tricks
(Microsoft Azure)
4. Create speech-enabled apps with the Azure Cognitive Speech service
(Microsoft for Startups)
5. "Easy" Computer Speech Recognition with Azure Cognitive Services and Python
(Eli the Computer Guy)
6. Explore Text-to-Speech and content moderation API's | Azure Developer Streams
(Microsoft Azure)
Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated: 02/23/2023

Views: 5888

Rating: 4.2 / 5 (73 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.