Enhanced Speech Translation Built on Whisper
Speech translation leverages Enhanced Speech to Text Build on Whisper to provide machine translation of various languages to English language. It offers seamless conversion of spoken language into English text while preserving meaning and context.
Speech translation technology supports multi-channel audio files. Resulting translations contain details about the processed channels, detected source languages and timestamps of the utterances. There is also a possibility to specify the source languages manually. Only licensed languages can be used for translation. To achieve reasonable translation speeds, GPU is required.
Language switching
In the default auto-detect mode, first 30 seconds are used to detect the language used for the translation of the whole recording to English. This behavior might negatively affect the resulting translations if parts of the recording after 30-second mark are in a different language. By using optional parameter language switching, the behavior is modified in a way that source languages are detected in 30-second segments. More details about language switching including limitations can be found in Enhanced Speech to Text build on Whisper article.