Skip to main content
Version: 3.4.0

Speech Translation

With a wide portfolio of more than 60 languages, Phonexia Speech Translation offers its users seamless translation of audio containing speech in any of those supported languages into an English text.

Supported languages

Phonexia Speech Translation supports the same language set as the one provided by Enhanced Speech to Text Built on Whisper. Have a look at the complete language portfolio.

How it works

Upload your files

The first step is to select the language utilized in the recordings. If uncertain about the language, an alternative is to employ auto-detect mode, which seamlessly identifies the language and proceeds with translation.

If you don't have your own files, you can use the provided Phonexia examples to explore how speech translation works.

Results

After processing, the English translations will appear in the right panel. Please note that the translation process may take a while.

caution

Cancelling the process or deleting the recording while you await the translation does not affect the translation speed, as the translation process continues uninterrupted on the server. Leaving the page while awaiting translation may result in process interruption.

Once the recordings are processed, you can play the original audio file while viewing the corresponding English text, with the spoken segments highlighted in real-time.

Export formats

Whether you export results in a bulk action or individually, you have the option to select from various export formats.

Plain text

This format provides plain text without timestamps or any additional metadata. The text merges together the translation of all speech without specifying individual channels.

Hello Andreas, I just want to let you know that we decided with my brother-in-law
that we'd rather go to Vienna this weekend,
because there is an exhibition in the Nature Science Museum on the topic of crystals
and you know that my nephew is back to minerals.
So we'll come to Berlin later
and I hope it will fit you at the end of the month at some point.
Please let me know.
Okay, bye.

Text with timestamps

This format contains two types of data: timestamps and text. The text merges together the translation of all speech without specifying individual channels.

00:00:00  Hello Andreas, I just want to let you know that we decided with my brother-in-law
00:00:05 that we'd rather go to Vienna this weekend,
00:00:09 because there is an exhibition in the Nature Science Museum on the topic of crystals
00:00:15 and you know that my nephew is back to minerals.
00:00:18 So we'll come to Berlin later
00:00:22 and I hope it will fit you at the end of the month at some point.
00:00:27 Please let me know.
00:00:29 Okay, bye.

CSV and XLSX formats

Both these formats contain the translated text, and identical metadata: translation technology, language code, source language code, detected language code, channel tags, segment timestamps, and confidence score.

info
  • Language code refers to the target language of transcription.
  • Source language code is the code of the original language of the audio specified by the user. If the user doesn’t specify this, the system will either set it to the detected language or, if the detected language isn’t available in the user's language portfolio for translation, the system will select the closest available language.
  • Detected source language code is the code of the original language of the audio as identified by the system.

The .CSV format is well-suited for users who work with large datasets, as it facilitates automated processing and filtering based on specific metadata. Start time and end time of each segment are represented in seconds in this format.

Transcription technology,Language code,Source language code,Detected source language code,Channel,Start time,End time,Confidence score,Transcription
Built on Whisper,en,de,de,0,0.34,5.78,,"Hello Andreas, I just want to let you know that we decided with my brother-in-law"
Built on Whisper,en,de,de,0,5.78,9.16,,"that we'd rather go to Vienna this weekend,"
Built on Whisper,en,de,de,0,9.16,15.18,,because there is an exhibition in the Nature Science Museum on the topic of crystals
Built on Whisper,en,de,de,0,15.18,18.97,,and you know that my nephew is back to minerals.
Built on Whisper,en,de,de,0,18.97,22.22,,So we'll come to Berlin later
Built on Whisper,en,de,de,0,22.22,27.37,,and I hope it will fit you at the end of the month at some point.
Built on Whisper,en,de,de,0,27.37,29.17,,Please let me know.
Built on Whisper,en,de,de,0,29.17,30.49,,"Okay, bye."

The .XLSX format provides a clear, comprehensive, and human-readable overview of the metadata and textual content, catering to users who prefer working with a more graphical data representation. In this format, timestamps are presented in the format: HH:MM:SS.

Excel Export Format

JSON format

This format presents machine-readable metadata equivalent to those provided in the CSV and XLSX formats.

{
"one_best": {
"segments": {
"segments": [
{
"channel_number": 0,
"start_time": 0.34,
"end_time": 5.78,
"language": "en",
"text": "Hello Andreas, I just want to let you know that we decided with my brother-in-law",
"source_language": "de",
"detected_source_language": "de"
},
{
"channel_number": 0,
"start_time": 5.78,
"end_time": 9.16,
"language": "en",
"text": "that we'd rather go to Vienna this weekend,",
"source_language": "de",
"detected_source_language": "de"
},
{
"channel_number": 0,
"start_time": 9.16,
"end_time": 15.18,
"language": "en",
"text": "because there is an exhibition in the Nature Science Museum on the topic of crystals",
"source_language": "de",
"detected_source_language": "de"
},
]
}
}
}