Dynamically Adding Words to the Language Model
Adding words to the Speech to Text (STT) language model on-the-fly is possible in SPE 3.45 or newer, as part of the preferred phrases feature.
The POST /technologies/stt
or POST /technologies/stt/input_stream
API calls
serve two purposes:
- Specifying the preferred phrases (in the
phrases
section). - Adding words to the STT language model (in the
dictionary
section).
Each part can be used independently. You can specify only preferred phrases, add words to the dictionary, or use both features simultaneously.
Here is an example of input for starting transcription, specifying two preferred phrases and two words to be added (one with an explicitly specified pronunciation):
{
"preferred_phrases": {
"phrases": [
{
"phrase": "this is a preferred phrase"
},
{
"phrase": "some other phrase"
}
],
"dictionary": [
{
"word": "preferred"
},
{
"word": "phrase",
"pronunciations": [
{
"phonemes": "f r ey z"
}
]
}
]
}
}
Words and pronunciations
Words to be added to the language model can be specified without a pronunciation. In such cases, the system will automatically generate a default pronunciation based on the word's letters, following internal linguistic rules for the given STT language. However, the automatically generated pronunciation may not always align with expectations, especially for foreign words due to differences between the word's native language and the STT language. Therefore, it is recommended to define pronunciations explicitly to prevent mis-transcriptions caused by incorrectly generated default pronunciations.
It is also possible to define multiple pronunciations, which can be particularly useful for uncommon or foreign words, slang terms, etc., that people might mispronounce.
Allowed characters
Generally, words should use only the letters (graphemes) allowed in the given
STT language. You can use GET /technologies/stt/graphemes
to retrieve the list
of allowed graphemes. However, it is also permissible to use letters from
different alphabets (e.g., a German word like “grüßen” in a Czech transcription)
or different writing scripts (such as Cyrillic or Japanese Kana). In such cases,
the word's pronunciation MUST be explicitly specified.
The pronunciation must use only phonemes supported by the STT language (use
GET /technologies/stt/phonemes
to retrieve the list of allowed phonemes). If a
word is specified using disallowed characters without an accompanying
pronunciation, that word will be ignored during transcription (see the
warning_message
parameter below).
Transcription results
If preferred phrases and/or words were specified when starting the
transcription, the result will contain the same phrases
and dictionary
structures used as input for the transcription task.
The dictionary
structure is enriched with the following:
- The
pronunciations
part is automatically generated for words that did not have pronunciations specified in the input. - The
out_of_vocabulary
parameter indicates whether the word exists in the internal vocabulary. - The
class
parameter contains the name of the word class to which the word belongs, if applicable. - The
warning_message
parameter contains any warning messages (if a warning message is present, the corresponding word/pronunciation was ignored and not used during transcription).
The example below shows the transcription result if the transcription was started using the input example provided earlier. The added parts are highlighted:
{
"result": {
"version": 5,
"name": "SpeechRecognitionResult",
"file": "/test.wav",
"model": "EN_US_6",
"phrases": [
{
"phrase": "this is a preferred phrase"
},
{
"phrase": "and some other phrase"
}
],
"dictionary": [
{
"word": "preferred",
"pronunciations": [
{
"phonemes": "p r ih f er d",
"out_of_vocabulary": false,
"class": "",
"warning_message": ""
}
]
},
{
"word": "phrase",
"pronunciations": [
{
"phonemes": "f r ey z",
"out_of_vocabulary": false,
"class": "",
"warning_message": ""
}
]
}
]
}
}