Audio Converter
SPE directly supports a limited list of audio formats (codecs and containers), see Input Audio Quality article. Other audio formats must be converted using external tools.
This conversion can be done either completely outside of SPE, before passing the files to SPE, or you can configure SPE to convert the files automatically. Depending on the capabilities of the conversion tool, you can upload nearly any audio or even video file to SPE, and it will be automatically converted to an audio format supported natively by SPE.
Automatic conversion occurs only when uploading audio files to SPE, it does not happen during file registration! For more info about uploading/registering audio files, refer to the Speech Engine home directory article.
Converter installation
As the first step, it's necessary to install the external converter of your choice on your operating system.
We recommend using either the FFmpeg (ffmpeg.org) or
SoX (sox.sourceforge.net). However, you can
use any other command-line tool or script that accepts an input file name
and output file name as parameters (e.g.,
my_converter <inputfile> <outputfile>
).
Linux
Use your distribution package manager to install the converter (e.g. on Ubuntu:
sudo apt install ffmpeg
, on CentOS sudo yum install ffmpeg
, or similar,
depending on your OS, version, and configuration).
Windows
The simplest approach is to download the converter distribution package from its
official site, then copy the executable (ffmpeg.exe
, sox.exe
, etc.) and all
necessary accompanying DLLs into the SPE installation directory, alongside
phxspe.exe
.
(The FFmpeg is a bit 'cleaner' choice on Windows, since it's available also as a single-executable static build, unlike SoX whose 10+ DLLs clutter up the SPE directory)
SPE configuration
As the next step, you need to enable and set up the converter in the SPE
configuration file (settings/phxspe.properties
).
- Set
audio_converter.enabled
totrue
to enable the converter. - Set
audio_converter.command
to the actual command that will be executed by the converter. Use%1
as a placeholder for the input file and%2
as a placeholder for the output file.
The SPE configuration file contains ready-to-use example commands for both FFmpeg and SoX. For both tools, you simply need to specify the input and output files, and the converter does its best guess on what exactly needs to be done.
After changing the configuration, restart SPE and the automatic audio conversion is ready for use.
How the audio conversion works
After uploading a file, the internal subsystem checks its format to determine if it is supported natively. The outcome can be one of the following:
- If the file is in a natively supported format, SPE continues processing without any conversion.
- If the file is in an "internally recognized" but not natively supported format
(e.g., MP3 audio):
- If the converter is enabled, SPE attempts to convert the file.
- If the converter is disabled, the upload results in an error.
- If the file is in an "internally unrecognized" format:
- If the converter is enabled, SPE attempts to convert the file.
- If the converter is disabled, the upload results in an error.
Example
Below you can see an example of the 3rd case, the "internally unrecognized" format (in this case it was G.723.1 codec, but it's not relevant).
First, the file was uploaded to SPE with the converter enabled. The "Corrupted WAVE file format" BSAPI exception might seem confusing, but it is actually a harmless error indicating that the format detection failed. However, because the converter is enabled, SPE automatically calls the converter, and the file is successfully converted and recognized afterward. The response then contains the attributes of the converted file.
# SPE log
==================================
2021-01-30 20:49:26 \[Debug\] server: Incoming request: \[RID=2\]from=127.0.0.1:52762, method=POST, URI=/audiofile?path=%2Ftest1.wav&format=json
2021-01-30 20:49:26 \[Trace\] ConverterSubsystem: Request stream saved to temporary file: C:\\TMP\\tmp9408aaaaaa
2021-01-30 20:49:26 \[Error\]ConverterSubsystem: Error during detecting file format 'C:\\TMP\\tmp9408aaaaaa':BsapiException: SWaveFileI(1751): Corrupted WAVE file format:'C:\\TMP\\tmp9408aaaaaa'.
2021-01-30 20:49:26 \[Trace\] ConverterSubsystem: Converting C:\\TMP\\tmp9408aaaaaa -> C:\\TMP\\tmp9408baaaaa.wav
2021-01-30 20:49:27 \[Debug\] ConverterSubsystem: File C:\\TMP\\tmp9408aaaaaa has been converted.
2021-01-30 20:49:27 \[Trace\] ConverterSubsystem: Removed temporary file: C:\\TMP\\tmp9408aaaaaa
2021-01-30 20:49:27 \[Trace\] Data: Moving:'C:\\TMP\\tmp9408baaaaa.wav' -> 'D:\\SPE\\home\\admin\\storage\\test1.wav'
2021-01-30 20:49:27 \[Trace\] Data: Moved: 'C:\\TMP\\tmp9408baaaaa.wav' -> 'D:\\SPE\\home\\admin\\storage\\test1.wav'
2021-01-30 20:49:27 \[Trace\] Data:File '/test1.wav' registered in database
2021-01-30 20:49:27 \[Trace\] Rest.Object.AudioFile: \[RID=2\] Response HTTP: 200
# JSON reposnse (all OK)
==================================
{ "result" : {
"version" : 3,
"name" : "AudioFileInfoResult",
"info" : {
"name": "test1.wav",
"last_modified" : "2021-01-30T19:49:27Z",
"created" :"2021-09-27T18:16:59Z",
"size" : 12800718,
"is_directory" : false,
"is_registered" : true,
"frequency" : 8000,
"length" : 400.02,
"n_channels" : 2,
"format" : "lin16"
}
}
}
The second time, the file was uploaded to SPE with the converter disabled. The
subsystem again failed to detect the format (BSAPI exception), and SPE couldn't
call the converter because it was disabled. Consequently, the upload fails with
an Unsupported audio format
error response.
# SPE log
==================================
2021-01-30 20:59:52 \[Debug\] server: Incoming request: \[RID=2\] from=127.0.0.1:51052, method=POST, URI=/audiofile?path=%2Ftest1.wav&format=json
2021-01-30 20:59:52 \[Trace\] ConverterSubsystem: Request stream saved to temporary file: C:\\TMP\\tmp11452aaaaaa
2021-01-30 20:59:52 \[Error\] ConverterSubsystem: Error during detecting file format'C:\\TMP\\tmp11452aaaaaa': BsapiException: SWaveFileI(1751): Corrupted WAVE file
format: 'C:\\TMP\\tmp11452aaaaaa'.
2021-01-30 20:59:52 \[Error\] ConverterSubsystem: Audio can't be converted: Converter is disabled
2021-01-30 20:59:52 \[Trace\] ConverterSubsystem: Removed temporary file: C:\\TMP\\tmp11452aaaaaa
2021-01-30 20:59:52 \[Error\] Rest.Object.AudioFile: \[RID=2\] REST error: (1007) Unsupported audio format
2021-01-30 20:59:52 \[Trace\] Rest.Object.AudioFile: \[RID=2\] Response HTTP: 415 RESTError: 1007
# JSON response (error)
==================================
{
"result" : {
"version" : 2,
"name" : "ErrorResult",
"code" : 1007,
"message" :"(1007) Unsupported audio format"
}
}