Speech to Text: start task
POST/api/technology/speech-to-text
Start a Speech to Text task for a media file.
Speech to Text features
- Multi-channel audio files are supported.
- Channel id is included in individual transcription segments.
- The built-in vocabulary can be extended using
config
field ofmultipart/form-data
. The value ofconfig
is a string in JSON format.
Fine-tuning the transcription
It is possible to gain finer control over the Speech to Text transcription in two ways. Firstly, you may specify an array of preferred phrases, which will be prioritized in ambiguous cases. Secondly, you can extend the built-in vocabulary by providing an array of additional words. These two options can be used either in tandem or independently.
Preferred phrases
In case of unclear speech, the Speech to Text technology usually prefers word that makes more sense in the given context. For example, it might be impossible to determine whether a speaker said "I'm going to cell my car" or "I'm going to sell my car". However, the context suggests that the speaker is probably talking about selling their car. In other cases though, it might not be as clear. Consider the following sentence: "He bought flour in the shop". In the given context, the flour might very well be flower.
Preferred phrases allow you to leverage your unique knowledge of recording's context and prompt the technology with utterances that are expected to appear in the speech. This is especially helpful for transcribing predictable or domain-specific conversations.
Additional words
You may use this configuration option to specify pronunciation of words that occur in preferred phrases. If the words from preferred phrases don't have explicitly specified pronunciation, one of two things can happen. Either a word is known by the technology's vocabulary and a built-in (and thus precise) pronunciation is used. Or it is an unknown word and the technology does its best to generate a default pronunciation based on the word's written form. Take note that this may result in incorrect pronunciations, especially with words foreign to the recording's language. Therefore, relying on auto-generated pronunciation is discouraged.
Alternatively, this option can be used to extend the technology's built-in vocabulary by new words. This is especially useful for industry-specific terms, foreign words, slang or neologisms. Another use case is adding region- or country-specific pronunciations to otherwise known words. Again, it is not mandatory to provide a pronunciation for new words, but the same limitations as described above apply.
Request
Query Parameters
A string specifying the language for Speech to Text Phonexia. The value should follow RFC 5646. It can consist of the "language", "region", and "privateuse" subtags. Refer to supported languages for a complete list of supported language tags.
Possible values: [split
, mix
]
Default value: split
A string enumeration value representing the channel mode for conversion. This value indicates how the audio channels should be processed during conversion. Only the channels with the specified indices (channels
parameter) will be processed, and others will be ignored.
A string of integers separated by comma (without spaces), representing the channels that should be kept during conversion. If specified, only the channels with the specified indices will be processed, and others will be ignored. If empty, all channels in the audio data will be processed. Note that channels
is 0-based.
Header Parameters
Correlation ID is a special type of request ID which is unique over a series of requests and responses, identifying a transaction in a distributed system. Correlation ID will be generated if not provided.
In distributed system architecture (microservices architecture) it is a unique ID of request and response combination throughout all components of a distributed system. Request ID will be generated if not provided.
- multipart/form-data
Bodyrequired
required
Array [
]
Input media file.
config
object
Optional configuration for Speech to Text.
Possible values: non-empty
Default value: ``
Array of phrases (made of one or many words) that are expected to appear in the media file. Phrases provided here are preferred over other variants in the transcription.
additional_words
object[]
Array of words, optionally with explicitly defined pronunciations. Words provided here extend the underlying language model for the given request. This is useful for adding names, slang, foreign or jargon terms or local pronunciations of common words. These are typically not included in the model's built-in vocabulary. This can also be used for specifying pronunciations of words that occur in preferred phrases.
Possible values: non-empty
The grapheme representation of the word. If you only use graphemes of the given language, you don't have to specify a pronunciation. If the word is known by the technology, a built-in pronunciation will be used. A new one will be generated otherwise. However, note that generated pronunciations may be incorrect for abbreviations, foreign words and other words with unusual spelling. You can also use graphemes outside of the language's valid grapheme set. In that case, you have to specify at least one pronunciation, otherwise the word will be rejected. Take note that only single words (without spaces) are accepted.
Possible values: non-empty
Default value: ``
Array of the word's pronunciations. Each pronunciation must be specified using only supported phonemes of the given language. Unless the pronunciation is already known by the language model, it must contain at least three phonemes. Individual phonemes must be separated by a space. Take special care when using phonemes that contain the backslash character (\
), as it has special meaning in JSON. You need to escape the backslash with another backslash to supress its special meaning.
Responses
- 202
- 400
- 403
- 413
- 422
- 429
- 507
Speech to Text task was accepted. Follow the X-Location
header to poll for the task state.
Response Headers
X-Location
string
Example: /api/technology/speech-to-text/123e4567-e89b-12d3-a456-426614174000
A URL the client should poll for task state and result.
- application/json
- Schema
- Example (from schema)
Schema
task
object
required
Possible values: [pending
, running
, rejected
, failed
, done
]
{
"task": {
"task_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"state": "pending"
}
}
Request payload data was invalid and could not be parsed.
- application/json
- Schema
- Example (from schema)
- request.invalid
Schema
Array [
Array [
- MOD1
- MOD2
]
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
Array [
- MOD1
- MOD2
- MOD3
- MOD4
]
]
Possible values: [internal
, resource.not-found
, method.invalid
, request.forbidden
, request.invalid
, request.validation-error
, request.rate-limit-exceeded
, request.size-limit-exceeded
, storage.capacity-exceeded
]
Machine-readable error type.
Human-readable summary of the error.
detail
object[]
Optional higher level of detail. It is intended for better understanding of the error or advanced error handling.
location
object[]
required
Location of the error.
anyOf
integer
string
Human-readable summary of the error.
Machine-readable error type.
context
object
Optional key-value object with additional context
property name*
object
anyOf
string
integer
number
boolean
anyOf
string
integer
number
boolean
{
"type": "internal",
"message": "string",
"detail": [
{
"location": [
0,
"string"
],
"message": "string",
"type": "string",
"context": {}
}
]
}
Invalid request.
{
"type": "request.invalid",
"message": "Invalid request.",
"detail": []
}
Request is forbidden.
- application/json
- Schema
- Example (from schema)
- request.forbidden
Schema
Array [
Array [
- MOD1
- MOD2
]
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
Array [
- MOD1
- MOD2
- MOD3
- MOD4
]
]
Possible values: [internal
, resource.not-found
, method.invalid
, request.forbidden
, request.invalid
, request.validation-error
, request.rate-limit-exceeded
, request.size-limit-exceeded
, storage.capacity-exceeded
]
Machine-readable error type.
Human-readable summary of the error.
detail
object[]
Optional higher level of detail. It is intended for better understanding of the error or advanced error handling.
location
object[]
required
Location of the error.
anyOf
integer
string
Human-readable summary of the error.
Machine-readable error type.
context
object
Optional key-value object with additional context
property name*
object
anyOf
string
integer
number
boolean
anyOf
string
integer
number
boolean
{
"type": "internal",
"message": "string",
"detail": [
{
"location": [
0,
"string"
],
"message": "string",
"type": "string",
"context": {}
}
]
}
Processing capacity allowed for the operation was exceeded.
{
"type": "request.forbidden",
"message": "Request is forbidden.",
"detail": [
{
"location": [
"license"
],
"message": "Licensed processing capacity exceeded.",
"type": "licensing.capacity-exceeded",
"context": {}
}
]
}
The request entity (payload) size exceeds the allowed limit.
- application/json
- Schema
- Example (from schema)
- request.size-limit-exceeded
Schema
Array [
Array [
- MOD1
- MOD2
]
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
Array [
- MOD1
- MOD2
- MOD3
- MOD4
]
]
Possible values: [internal
, resource.not-found
, method.invalid
, request.forbidden
, request.invalid
, request.validation-error
, request.rate-limit-exceeded
, request.size-limit-exceeded
, storage.capacity-exceeded
]
Machine-readable error type.
Human-readable summary of the error.
detail
object[]
Optional higher level of detail. It is intended for better understanding of the error or advanced error handling.
location
object[]
required
Location of the error.
anyOf
integer
string
Human-readable summary of the error.
Machine-readable error type.
context
object
Optional key-value object with additional context
property name*
object
anyOf
string
integer
number
boolean
anyOf
string
integer
number
boolean
{
"type": "internal",
"message": "string",
"detail": [
{
"location": [
0,
"string"
],
"message": "string",
"type": "string",
"context": {}
}
]
}
Request size limit exceeded.
{
"type": "request.size-limit-exceeded",
"message": "Request size limit exceeded.",
"detail": [
{
"location": [
"body",
"file"
],
"message": "Input media file too large.",
"type": "media.too-large",
"context": {
"file_size": 1048576000,
"max_file_size": 524288000,
"size_unit": "bytes"
}
}
]
}
Error during validation of request payload data occurred.
- application/json
- Schema
- Example (from schema)
- request validation error
Schema
Array [
Array [
- MOD1
- MOD2
]
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
Array [
- MOD1
- MOD2
- MOD3
- MOD4
]
]
Possible values: [internal
, resource.not-found
, method.invalid
, request.forbidden
, request.invalid
, request.validation-error
, request.rate-limit-exceeded
, request.size-limit-exceeded
, storage.capacity-exceeded
]
Machine-readable error type.
Human-readable summary of the error.
detail
object[]
Optional higher level of detail. It is intended for better understanding of the error or advanced error handling.
location
object[]
required
Location of the error.
anyOf
integer
string
Human-readable summary of the error.
Machine-readable error type.
context
object
Optional key-value object with additional context
property name*
object
anyOf
string
integer
number
boolean
anyOf
string
integer
number
boolean
{
"type": "internal",
"message": "string",
"detail": [
{
"location": [
0,
"string"
],
"message": "string",
"type": "string",
"context": {}
}
]
}
Request validation error.
{
"type": "request.validation-error",
"message": "Request validation error.",
"detail": []
}
Request rate limit exceeded.
The request may be retried after a while. The following response headers may be checked for details: retry-after
, x-ratelimit-limit
, x-ratelimit-remaining
, x-ratelimit-reset
.
Response Headers
retry-after
number
Header indicates how long the user agent should wait before making a follow-up request.
x-ratelimit-limit
number
Size of the current rate limiting window.
x-ratelimit-remaining
number
Remaining number of requests in the current rate limiting window.
x-ratelimit-reset
number
Time at which the current rate limiting window resets (in UTC epoch).
- application/json
- Schema
- Example (from schema)
- request.rate-limit-exceeded
Schema
Array [
Array [
- MOD1
- MOD2
]
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
Array [
- MOD1
- MOD2
- MOD3
- MOD4
]
]
Possible values: [internal
, resource.not-found
, method.invalid
, request.forbidden
, request.invalid
, request.validation-error
, request.rate-limit-exceeded
, request.size-limit-exceeded
, storage.capacity-exceeded
]
Machine-readable error type.
Human-readable summary of the error.
detail
object[]
Optional higher level of detail. It is intended for better understanding of the error or advanced error handling.
location
object[]
required
Location of the error.
anyOf
integer
string
Human-readable summary of the error.
Machine-readable error type.
context
object
Optional key-value object with additional context
property name*
object
anyOf
string
integer
number
boolean
anyOf
string
integer
number
boolean
{
"type": "internal",
"message": "string",
"detail": [
{
"location": [
0,
"string"
],
"message": "string",
"type": "string",
"context": {}
}
]
}
Rate limit exceeded.
{
"type": "request.rate-limit-exceeded",
"message": "Rate limit exceeded: 1 per 5 second.",
"detail": []
}
The storage is full and cannot accept any data.
- application/json
- Schema
- Example (from schema)
- insufficient storage
Schema
Array [
Array [
- MOD1
- MOD2
]
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
Array [
- MOD1
- MOD2
- MOD3
- MOD4
]
]
Possible values: [internal
, resource.not-found
, method.invalid
, request.forbidden
, request.invalid
, request.validation-error
, request.rate-limit-exceeded
, request.size-limit-exceeded
, storage.capacity-exceeded
]
Machine-readable error type.
Human-readable summary of the error.
detail
object[]
Optional higher level of detail. It is intended for better understanding of the error or advanced error handling.
location
object[]
required
Location of the error.
anyOf
integer
string
Human-readable summary of the error.
Machine-readable error type.
context
object
Optional key-value object with additional context
property name*
object
anyOf
string
integer
number
boolean
anyOf
string
integer
number
boolean
{
"type": "internal",
"message": "string",
"detail": [
{
"location": [
0,
"string"
],
"message": "string",
"type": "string",
"context": {}
}
]
}
Storage capacity exceeded.
{
"type": "storage.capacity-exceeded",
"message": "Storage capacity exceeded.",
"detail": []
}