Language Identification Overview
Phonexia Language Identification (LID) helps you distinguish spoken languages or dialects. It enables your system to automatically route valuable calls to language experts or forward them to other software for analysis.
Application areas
- Preprocessing audio for other speech recognition technologies (such as Speaker Identification or Speech to Text).
- Fast filtering of audio based on language.
Scoring and results
The LID language pack defines a set of recognizable languages (represented by language models). When identifying the language in an audio recording (or "languageprint"), LID follows these steps:
- Creates a languageprint of the recording (if the input is an audio recording).
- Compares that languageprint with each language model in the language pack.
- Calculation of the probability that the two languages are the same.
For explanations of the terms "languageprint," "language model," and "language pack," refer to the Terminology and Adaptation article.
The final scores are returned as logarithms of these probabilities, i.e., as values from the (-∞, 0) range, for each language in the language pack. (To convert the raw LID score to a percentage, use the formula e^(score) * 100.)
LID adaptation (custom language packs)
The scoring principle described above implies that the score is distributed among all the languages in a language pack. This means that every language receives a non-zero score, resulting in the scores being diluted as they are spread across many languages.
Additionally, if the language pack contains many unequally trained languages (i.e., using very different amounts of source audio), the entire system may be affected, leading to low scores even for matching languages.
Therefore, it is advisable to create a language pack containing only a limited number of languages, such as by excluding rare or irrelevant ones, or by retaining only the few languages expected in your use case.
This process of tailoring the language pack to specific needs is called language pack adaptation and is described in the Terminology and Adaptation article.
Example usages of custom language packs
- A law enforcement agency monitoring a network of criminals who use only a particular set of languages can use the approach of keeping only languages expected to appear in the traffic. This can reduce the number of scored languages to as few as 3 or 5.
- A multilingual call center serving the European market can use the approach of excluding languages that are highly unlikely to appear in their traffic, such as African languages (e.g., Afan, Hausa) or Asian languages (e.g., Chinese, Japanese), while retaining languages that are less likely but still possible. This can reduce the number of scored languages from around 80 (included in the default out-of-the-box language pack) to about 20 or even fewer.
In both cases, limiting the number of languages in the language pack results in the scores being distributed among fewer languages, leading to higher score values, clearer distinctions between languages, and a more pronounced gap between the highest-scoring language and the others.
Here is an example of a Turkish phone call identification:
You may notice a much sharper score when using a language pack with only relevant languages (77.3% vs. 93.3%):
Using Default Language Pack with 60+ Languages | Using Limited Language Pack with 20 European Languages | ||||
---|---|---|---|---|---|
Language | Raw Score | Percentage | Language | Raw Score | Percentage |
Turkish | -0.258 | 77.270% | Turkish | -0.069 | 93.326% |
Uzbek | -2.436 | 8.753% | Albanian | -4.347 | 1.294% |
Azerbaijani | -3.027 | 4.845% | Hungarian | -4.657 | 0.949% |
Dari | -4.432 | 1.190% | Ukrainian | -5.037 | 0.649% |
Albanian | -5.139 | 0.586% | Swedish | -5.088 | 0.617% |
Tibetan | -5.270 | 0.515% | French | -5.168 | 0.570% |
Georgian | -5.277 | 0.511% | English (British) | -5.316 | 0.491% |
Swedish | -5.384 | 0.459% | Macedonian | -5.443 | 0.433% |
Farsi | -5.737 | 0.323% | Greek | -5.698 | 0.335% |
Hungarian | -5.777 | 0.310% | Serbian | -6.002 | 0.247% |
... |