Skip to main content

Audio Converter

SPE directly supports a limited list of audio formats (codecs and containers), see Input Audio Quality article. Other audio formats must be converted using external tools.

This conversion can be done either completely outside of SPE, before passing the files to SPE, or you can configure SPE to convert the files automatically. Depending on the capabilities of the conversion tool, you can upload nearly any audio or even video file to SPE, and it will be automatically converted to an audio format supported natively by SPE.

caution

Automatic conversion occurs only when uploading audio files to SPE, it does not happen during file registration! For more info about uploading/registering audio files, refer to the Speech Engine home directory article.

Converter installation

As the first step, it's necessary to install the external converter of your choice on your operating system.

tip

We recommend using either the FFmpeg (ffmpeg.org) or SoX (sox.sourceforge.net). However, you can use any other command-line tool or script that accepts an input file name and output file name as parameters (e.g., my_converter <inputfile> <outputfile>).

Linux

Use your distribution package manager to install the converter (e.g. on Ubuntu: sudo apt install ffmpeg, on CentOS sudo yum install ffmpeg, or similar, depending on your OS, version, and configuration).

Windows

The simplest approach is to download the converter distribution package from its official site, then copy the executable (ffmpeg.exe, sox.exe, etc.) and all necessary accompanying DLLs into the SPE installation directory, alongside phxspe.exe.

(The FFmpeg is a bit 'cleaner' choice on Windows, since it's available also as a single-executable static build, unlike SoX whose 10+ DLLs clutter up the SPE directory)

SPE configuration

As the next step, you need to enable and set up the converter in the SPE configuration file (settings/phxspe.properties).

  1. Set audio_converter.enabled to true to enable the converter.
  2. Set audio_converter.command to the actual command that will be executed by the converter. Use %1 as a placeholder for the input file and %2 as a placeholder for the output file.

The SPE configuration file contains ready-to-use example commands for both FFmpeg and SoX. For both tools, you simply need to specify the input and output files, and the converter does its best guess on what exactly needs to be done.

After changing the configuration, restart SPE and the automatic audio conversion is ready for use.

How the audio conversion works

After uploading a file, the internal subsystem checks its format to determine if it is supported natively. The outcome can be one of the following:

  • If the file is in a natively supported format, SPE continues processing without any conversion.
  • If the file is in an "internally recognized" but not natively supported format (e.g., MP3 audio):
    • If the converter is enabled, SPE attempts to convert the file.
    • If the converter is disabled, the upload results in an error.
  • If the file is in an "internally unrecognized" format:
    • If the converter is enabled, SPE attempts to convert the file.
    • If the converter is disabled, the upload results in an error.

Example

Below you can see an example of the 3rd case, the "internally unrecognized" format (in this case it was G.723.1 codec, but it's not relevant).

First, the file was uploaded to SPE with the converter enabled. The "Corrupted WAVE file format" BSAPI exception might seem confusing, but it is actually a harmless error indicating that the format detection failed. However, because the converter is enabled, SPE automatically calls the converter, and the file is successfully converted and recognized afterward. The response then contains the attributes of the converted file.

# SPE log
==================================
2021-01-30 20:49:26 \[Debug\] server: Incoming request: \[RID=2\]from=127.0.0.1:52762, method=POST, URI=/audiofile?path=%2Ftest1.wav&format=json
2021-01-30 20:49:26 \[Trace\] ConverterSubsystem: Request stream saved to temporary file: C:\\TMP\\tmp9408aaaaaa
2021-01-30 20:49:26 \[Error\]ConverterSubsystem: Error during detecting file format 'C:\\TMP\\tmp9408aaaaaa':BsapiException: SWaveFileI(1751): Corrupted WAVE file format:'C:\\TMP\\tmp9408aaaaaa'.
2021-01-30 20:49:26 \[Trace\] ConverterSubsystem: Converting C:\\TMP\\tmp9408aaaaaa -> C:\\TMP\\tmp9408baaaaa.wav
2021-01-30 20:49:27 \[Debug\] ConverterSubsystem: File C:\\TMP\\tmp9408aaaaaa has been converted.
2021-01-30 20:49:27 \[Trace\] ConverterSubsystem: Removed temporary file: C:\\TMP\\tmp9408aaaaaa
2021-01-30 20:49:27 \[Trace\] Data: Moving:'C:\\TMP\\tmp9408baaaaa.wav' -> 'D:\\SPE\\home\\admin\\storage\\test1.wav'
2021-01-30 20:49:27 \[Trace\] Data: Moved: 'C:\\TMP\\tmp9408baaaaa.wav' -> 'D:\\SPE\\home\\admin\\storage\\test1.wav'
2021-01-30 20:49:27 \[Trace\] Data:File '/test1.wav' registered in database
2021-01-30 20:49:27 \[Trace\] Rest.Object.AudioFile: \[RID=2\] Response HTTP: 200

# JSON reposnse (all OK)
==================================
{ "result" : {
"version" : 3,
"name" : "AudioFileInfoResult",
"info" : {
"name": "test1.wav",
"last_modified" : "2021-01-30T19:49:27Z",
"created" :"2021-09-27T18:16:59Z",
"size" : 12800718,
"is_directory" : false,
"is_registered" : true,
"frequency" : 8000,
"length" : 400.02,
"n_channels" : 2,
"format" : "lin16"
}
}
}

The second time, the file was uploaded to SPE with the converter disabled. The subsystem again failed to detect the format (BSAPI exception), and SPE couldn't call the converter because it was disabled. Consequently, the upload fails with an Unsupported audio format error response.

# SPE log
==================================
2021-01-30 20:59:52 \[Debug\] server: Incoming request: \[RID=2\] from=127.0.0.1:51052, method=POST, URI=/audiofile?path=%2Ftest1.wav&format=json
2021-01-30 20:59:52 \[Trace\] ConverterSubsystem: Request stream saved to temporary file: C:\\TMP\\tmp11452aaaaaa
2021-01-30 20:59:52 \[Error\] ConverterSubsystem: Error during detecting file format'C:\\TMP\\tmp11452aaaaaa': BsapiException: SWaveFileI(1751): Corrupted WAVE file
format: 'C:\\TMP\\tmp11452aaaaaa'.
2021-01-30 20:59:52 \[Error\] ConverterSubsystem: Audio can't be converted: Converter is disabled
2021-01-30 20:59:52 \[Trace\] ConverterSubsystem: Removed temporary file: C:\\TMP\\tmp11452aaaaaa
2021-01-30 20:59:52 \[Error\] Rest.Object.AudioFile: \[RID=2\] REST error: (1007) Unsupported audio format
2021-01-30 20:59:52 \[Trace\] Rest.Object.AudioFile: \[RID=2\] Response HTTP: 415 RESTError: 1007

# JSON response (error)
==================================
{
"result" : {
"version" : 2,
"name" : "ErrorResult",
"code" : 1007,
"message" :"(1007) Unsupported audio format"
}
}