Speech Translation
With a wide portfolio of more than 60 languages, Phonexia Speech Translation offers its users seamless translation of audio containing speech in any of those supported languages into an English text.
Phonexia Speech Translation supports the same language set as the one provided by Enhanced Speech to Text Built on Whisper. Have a look at the complete language portfolio.
How it works
Upload your files
The first step is to select the language utilized in the recordings. If uncertain about the language, an alternative is to employ auto-detect mode, which seamlessly identifies the language and proceeds with translation.
If you don't have your own files, you can use the provided Phonexia examples to explore how speech translation works.
Results
After processing, the English translations will appear in the right panel. Please note that the translation process may take a while.
Cancelling the process or deleting the recording while you await the translation does not affect the translation speed, as the translation process continues uninterrupted on the server. Leaving the page while awaiting translation may result in process interruption.
Once the recordings are processed, you can play the original audio file while viewing the corresponding English text, with the spoken segments highlighted in real-time.
Export formats
Whether you export results in a bulk action or individually, you have the option to select from various export formats.
Plain text
This format provides plain text without timestamps or any additional metadata. The text merges together the translation of all speech without specifying individual channels.
Hello Andreas, I just want to let you know that we decided with my brother-in-law
that we'd rather go to Vienna this weekend,
because there is an exhibition in the Nature Science Museum on the topic of crystals
and you know that my nephew is back to minerals.
So we'll come to Berlin later
and I hope it will fit you at the end of the month at some point.
Please let me know.
Okay, bye.
Text with timestamps
This format contains two types of data: timestamps and text. The text merges together the translation of all speech without specifying individual channels.
00:00:00 Hello Andreas, I just want to let you know that we decided with my brother-in-law
00:00:05 that we'd rather go to Vienna this weekend,
00:00:09 because there is an exhibition in the Nature Science Museum on the topic of crystals
00:00:15 and you know that my nephew is back to minerals.
00:00:18 So we'll come to Berlin later
00:00:22 and I hope it will fit you at the end of the month at some point.
00:00:27 Please let me know.
00:00:29 Okay, bye.
CSV and XLSX formats
Both these formats contain the translated text, and identical metadata: translation technology, language code, source language code, detected language code, channel tags, segment timestamps, and confidence score.
- Language code refers to the target language of transcription.
- Source language code is the code of the original language of the audio specified by the user. If the user doesn’t specify this, the system will either set it to the detected language or, if the detected language isn’t available in the user's language portfolio for translation, the system will select the closest available language.
- Detected source language code is the code of the original language of the audio as identified by the system.
The .CSV format is well-suited for users who work with large datasets, as it facilitates automated processing and filtering based on specific metadata. Start time and end time of each segment are represented in seconds in this format.
Transcription technology,Language code,Source language code,Detected source language code,Channel,Start time,End time,Confidence score,Transcription
Built on Whisper,en,de,de,0,0.34,5.78,,"Hello Andreas, I just want to let you know that we decided with my brother-in-law"
Built on Whisper,en,de,de,0,5.78,9.16,,"that we'd rather go to Vienna this weekend,"
Built on Whisper,en,de,de,0,9.16,15.18,,because there is an exhibition in the Nature Science Museum on the topic of crystals
Built on Whisper,en,de,de,0,15.18,18.97,,and you know that my nephew is back to minerals.
Built on Whisper,en,de,de,0,18.97,22.22,,So we'll come to Berlin later
Built on Whisper,en,de,de,0,22.22,27.37,,and I hope it will fit you at the end of the month at some point.
Built on Whisper,en,de,de,0,27.37,29.17,,Please let me know.
Built on Whisper,en,de,de,0,29.17,30.49,,"Okay, bye."
The .XLSX format provides a clear, comprehensive, and human-readable overview of the metadata and textual content, catering to users who prefer working with a more graphical data representation. In this format, timestamps are presented in the format: HH:MM:SS.
JSON format
This format presents machine-readable metadata equivalent to those provided in the CSV and XLSX formats.
{
"one_best": {
"segments": {
"segments": [
{
"channel_number": 0,
"start_time": 0.34,
"end_time": 5.78,
"language": "en",
"text": "Hello Andreas, I just want to let you know that we decided with my brother-in-law",
"source_language": "de",
"detected_source_language": "de"
},
{
"channel_number": 0,
"start_time": 5.78,
"end_time": 9.16,
"language": "en",
"text": "that we'd rather go to Vienna this weekend,",
"source_language": "de",
"detected_source_language": "de"
},
{
"channel_number": 0,
"start_time": 9.16,
"end_time": 15.18,
"language": "en",
"text": "because there is an exhibition in the Nature Science Museum on the topic of crystals",
"source_language": "de",
"detected_source_language": "de"
},
]
}
}
}