Model Adaptation
Phonexia Language Identification technology provides an adaptation mechanism that allows users to enhance the performance of existing languages or add new languages based on custom data. This adaptation process uses supervised learning techniques, where new annotated data is supplied to the system, enabling it to recognize new languages or improve its accuracy for a specific language or dialect.
Why adaptation matters
-
Improved accuracy for specific languages: As new recordings or speech patterns are collected for a language, they can be fed into the system to fine-tune the model. This adaptation results in better recognition for languages that may have been underrepresented in the original training data.
-
Support for new languages: Adaptation enables the addition of previously unsupported languages. For instance, if the system doesn't recognize a rare or regional language, users can upload annotated recordings of that language, and the system can learn to identify it.
-
Handling accents and dialects: Many languages feature regional dialects that differ significantly from one another. Through adaptation, the system can be trained to differentiate between dialects or group them based on user needs.
The adaptation workflow: how it works
Language Identification gRPC API includes a dedicated method for performing language adaptation. This API supports the entire workflow of adaptation, including audio processing, adaptation, and identification. The adaptation process can be broken down into the following steps:
1. Extract languageprints
Languageprints are extracted from user-provided audio data using the Extract
method. A languageprint represents the characteristic patterns of a particular
language, which are stored for further analysis.
User-provided audio data requirements:
- Minimum 10 seconds of speech per recording.
- Mono channel recording only.
- Only one voice (language) per recording.
- Try to avoid recording with high compression codecs with low bitrate.
- Try to avoid duplicate recordings.
For more information see Extract method documentation.
Because the extraction is a computationally intensive operation, it is a good idea to save languageprints for later use.
2. Supervised adaptation
The extracted languageprints, annotated with language labels, are used as input
for the Adapt
method. The system can adapt existing languages or add new ones.
When the adaptation is finished, the adaptation profile is returned.
For more information see Adapt method documentation.
The Adaptation Profile is a small binary (around 100 kB) that is intended to be stored in a file or database. The profile is associated with the technology model. It cannot be used between different models.
3. Language identification with adaptation
Once the model has been adapted, users can perform language identification using
the Identify
method with the new adaptation profile. The system will now
include the adapted languages and provide more accurate results based on the
newly supplied data.
For more information see Identify method documentation.
FAQ
To adapt or add a new language, how many hours of audio are required?
Only audio hours are not so important. It can be relatively small amount, e.g. half an hour for adaptation and 1-2 hours for adding new language. The number of recordings with different speakers in different situations (environments) is crucial. We recommend at least 100 recordings for good results.
How long does the adaptation take?
Only the adaptation (the Adapt
method) is very fast. Also, the overhead of
identifying with adaptation is very small. The most expensive operation is
languageprint extraction.