Model Adaptation
Phonexia Language Identification technology provides an adaptation mechanism that allows users to enhance the performance of existing languages or add new languages based on custom data. This adaptation process uses supervised learning techniques, where new annotated data is supplied to the system, enabling it to recognize new languages or improve its accuracy for a specific language or dialect.
Why adaptation matters
-
Improved accuracy for specific languages: As new recordings or speech patterns are collected for a language, they can be fed into the system to fine-tune the model. This adaptation results in better recognition for languages that may have been underrepresented in the original training data.
-
Support for new languages: Adaptation enables the addition of previously unsupported languages. For instance, if the system doesn't recognize a rare or regional language, users can upload annotated recordings of that language, and the system can learn to identify it.
-
Handling accents and dialects: Many languages feature regional dialects that differ significantly from one another. Through adaptation, the system can be trained to differentiate between dialects or group them based on user needs.
The adaptation workflow: how it works
Language Identification gRPC API includes a dedicated method for performing language adaptation. This API supports the entire workflow of adaptation, including audio processing, adaptation, and identification. The adaptation process can be broken down into the following steps:
1. Extract languageprints
Languageprints are extracted from user-provided audio data using the Extract
method. A languageprint represents the characteristic patterns of a particular
language, which are stored for further analysis.
User-provided audio data requirements:
- Minimum 10 seconds of speech per recording.
- Mono channel recording only.
- Only one voice (language) per recording.
- Try to avoid recording with high compression codecs with low bitrate.
- Try to avoid duplicate recordings.
For more information see Extract method documentation.
Because the extraction is a computationally intensive operation, it is a good idea to save languageprints for later use.
2. Supervised adaptation
The extracted language prints, annotated with language labels, are used as input
for the Adapt
method. Language adaptation is typically performed with many
adaptation units - language prints. Each language print must be assigned to
either an existing language (code) to adapt its accuracy on new data, or to a
new identifier to create a new language. When the adaptation is finished, the
adaptation profile is returned.
For more information see Adapt method documentation.
The Adaptation Profile is a small binary (around 100 kB) that is intended to be stored in a file or database. The profile is associated with the technology model. It cannot be used between different models.
3. Language identification with adaptation
Once the model has been adapted, users can perform language identification using
the Identify
method with the new adaptation profile. The system will now
include the adapted languages and provide more accurate results based on the
newly supplied data.
For more information see Identify method documentation.
FAQ
To adapt or add a new language, how many hours of audio are required?
Only audio hours are not so important. It can be relatively small amount, e.g. half an hour for adaptation and 1-2 hours for adding new language. The number of recordings with different speakers in different situations (environments) is crucial. We recommend at least 100 recordings for good results.
How long does the adaptation take?
Only the adaptation (the Adapt
method) is very fast. Also, the overhead of
identifying with adaptation is very small. The most expensive operation is
languageprint extraction.
Examples of the LID adaption via Phonexia Python client
The easiest way to get started with testing is to use our simple Python client. To get it, run follow the steps below:
- Install Phonexia Language identification client
pip install phonexia-language-identification-client
- Extract language prints
language_identification_client -H ip:port extract --list \my_list_example_extract.txt -l info
- Adaptation example
language_identification_client -H ip:port adapt --list \en-int_adapt.txt -o \Adapt\en-int.AP -l info
- Identify with adapt
language_identification_client -H ip:port identify--adaptation_profile \Adapt\test.AP --list \my_list_example_identify_adapt.txt
For more information, please see the article: Language identification microservices.