Skip to main content
Version: 3.4.0

Model Adaptation

Phonexia Language Identification technology provides an adaptation mechanism that allows users to enhance the performance of existing languages or add new languages based on custom data. This adaptation process uses supervised learning techniques, where new annotated data is supplied to the system, enabling it to recognize new languages or improve its accuracy for a specific language or dialect.

Why adaptation matters

  1. Improved accuracy for specific languages: As new recordings or speech patterns are collected for a language, they can be fed into the system to fine-tune the model. This adaptation results in better recognition for languages that may have been underrepresented in the original training data.

  2. Support for new languages: Adaptation enables the addition of previously unsupported languages. For instance, if the system doesn't recognize a rare or regional language, users can upload annotated recordings of that language, and the system can learn to identify it.

  3. Handling accents and dialects: Many languages feature regional dialects that differ significantly from one another. Through adaptation, the system can be trained to differentiate between dialects or group them based on user needs.

The adaptation workflow: how it works

Language Identification gRPC API includes a dedicated method for performing language adaptation. This API supports the entire workflow of adaptation, including audio processing, adaptation, and identification. The adaptation process can be broken down into the following steps:

1. Extract languageprints

Languageprints are extracted from user-provided audio data using the Extract method. A languageprint represents the characteristic patterns of a particular language, which are stored for further analysis.

User-provided audio data requirements:

  • Minimum 10 seconds of speech per recording.
  • Mono channel recording only.
  • Only one voice (language) per recording.
  • Try to avoid recording with high compression codecs with low bitrate.
  • Try to avoid duplicate recordings.

For more information see Extract method documentation.

tip

Because the extraction is a computationally intensive operation, it is a good idea to save languageprints for later use.

2. Supervised adaptation

The extracted languageprints, annotated with language labels, are used as input for the Adapt method. The system can adapt existing languages or add new ones. When the adaptation is finished, the adaptation profile is returned.

For more information see Adapt method documentation.

warning

The Adaptation Profile is a small binary (around 100 kB) that is intended to be stored in a file or database. The profile is associated with the technology model. It cannot be used between different models.

3. Language identification with adaptation

Once the model has been adapted, users can perform language identification using the Identify method with the new adaptation profile. The system will now include the adapted languages and provide more accurate results based on the newly supplied data.

For more information see Identify method documentation.

FAQ

To adapt or add a new language, how many hours of audio are required?

Only audio hours are not so important. It can be relatively small amount, e.g. half an hour for adaptation and 1-2 hours for adding new language. The number of recordings with different speakers in different situations (environments) is crucial. We recommend at least 100 recordings for good results.

How long does the adaptation take?

Only the adaptation (the Adapt method) is very fast. Also, the overhead of identifying with adaptation is very small. The most expensive operation is languageprint extraction.