Supported Audio Formats
This page describes the audio file formats and encodings supported by the microservices.
Quick Reference
| File Format | Extensions | Containers | Common Encodings |
|---|---|---|---|
| WAV | .wav | RIFF, RIFX, AIFF, RF64, W64 | PCM, IEEE float, A-law, µ-law, ADPCM |
| FLAC | .flac | Native FLAC, Ogg/FLAC | PCM (16/24/32-bit) |
| RAW | varies | None (headerless) | PCM, IEEE float, A-law, µ-law |
All microservices require mono audio input, with one exception: the Time Analysis microservice supports both mono and stereo audio.
Detailed Format Specifications
WAV
The WAV format is supported with multiple container variants and a wide range of sample encodings.
Supported Containers
| Container | Description |
|---|---|
| RIFF | Standard WAV format (little-endian) |
| RIFX | Big-endian WAV variant |
| AIFF | Audio Interchange File Format |
| RF64 | Extended WAV format for files larger than 4 GB |
| W64 | Sony Wave64 format for large files |
Supported Encodings
| Encoding | Description |
|---|---|
| Unsigned 8-bit PCM | 8-bit unsigned integer samples |
| Signed 12-bit PCM | 12-bit signed integer samples |
| Signed 16-bit PCM | 16-bit signed integer samples (CD quality) |
| Signed 24-bit PCM | 24-bit signed integer samples (studio quality) |
| Signed 32-bit PCM | 32-bit signed integer samples |
| IEEE 32-bit float | 32-bit floating-point samples |
| IEEE 64-bit float | 64-bit floating-point samples |
| A-law | Logarithmic compression (telephony standard) |
| µ-law (u-law) | Logarithmic compression (North American telephony) |
| Microsoft ADPCM | Adaptive differential pulse-code modulation |
| IMA ADPCM | DVI ADPCM, format code 0x11 |
ADPCM encoding is not supported when using the AIFF container.
FLAC
FLAC (Free Lossless Audio Codec) provides lossless compression with full audio quality preservation.
Supported Containers
| Container | Description |
|---|---|
| Native FLAC | Standard FLAC format |
| Ogg/FLAC | FLAC audio in Ogg container |
Supported Encodings
| Encoding | Description |
|---|---|
| Signed 16-bit PCM | 16-bit signed integer samples |
| Signed 24-bit PCM | 24-bit signed integer samples |
| Signed 32-bit PCM | 32-bit signed integer samples |
RAW Audio Stream
For raw audio data without a container, the following formats are supported:
| Encoding | Description |
|---|---|
| Signed 16-bit PCM | 16-bit signed integer samples |
| Signed 32-bit PCM | 32-bit signed integer samples |
| IEEE 32-bit float | 32-bit floating-point samples |
| A-law | Logarithmic compression |
| µ-law (u-law) | Logarithmic compression |
When using raw audio streams, you must specify the sample rate, sample format, and other audio parameters explicitly, as this information is not embedded in the data.
Sample Rate
There are no strict limitations on sample rate. However, for optimal speech processing performance, a sample rate of 8 kHz or higher is recommended. Common sample rates include:
- 8 kHz – Telephony quality
- 16 kHz – Wideband speech
- 44.1 kHz – CD quality
- 48 kHz – Professional audio/video