Skip to main content
Version: 3.3.0

Comparison of the 4th and 5th Generation of Language Identification

At Phonexia, we strive for continuous improvement, which is why we continue to develop new generations of our technologies. This article compares the latest generation of Language Identification XL5 and its predecessor L4.

Supported languages

One of the improvements in the new generation is that the number of supported languages has almost doubled, with XL5 offering up to 140.

Added languages include Danish, Finnish, Estonian, additional varieties of Arabic, Hebrew, and many others. For a full list of languages supported by XL5 and L4, please refer to the table below.

Comparison of languages
L4 CodeL4 Language NameXL5 CodeXL5 Language Name
ab-GEAbkhaz
afAfrikaans
sq-ALAlbaniansq-ALAlbanian
am-ETAmharicam-ETAmharic
ar-EGArabic (Egypt)ar-EGArabic (Egypt)
ar-KWArabic (Gulf, Kuwait)ar-KWArabic (Gulf)
ar-IQArabic (Iraq)ar-IQArabic (Iraq)
ar-XLArabic (Levantine)ar-XLArabic (Levantine)
ar-MAArabic (Maghrebi)ar-MAArabic (Maghrebi)
ar-OMArabic (Oman)
ar-SAArabic (Saudi)
ar-TNArabic (Tunisia)
ar-YEArabic (Yemen)
arbArabic (MSA)arbArabic (MSA)
hy-AMArmenian
as-INAssameseas-INAssamese
ast-ESAsturian
az-AZAzerbaijaniaz-AZAzerbaijani
ba-RUBashkir
euBasque
bn-BDBengali (Bangladesh)bnBengali
be-BYBelarusianbe-BYBelarusian
br-FRBreton
bg-BGBulgarianbg-BGBulgarian
my-MMBurmesemy-MMBurmese
kea-CVCape Verdean Creole
ca-ESCatalan
ceb-PHCebuanoceb-PHCebuano
zh-HKChinese (Cantonese, Hong Kong)zh-HKCantonese
zh-CNChinese (Mandarin, China)zh-CNChinese (Mandarin)
nan-CNChinese (Min Nan)min-CNChinese (Min)
wuu-CNChinese (Wu)wuu-CNChinese (Wu)
cv-RUChuvashcv-RUChuvash
cs-CZCzechcs-CZCzech
fa-AFDaricovered in fa - Persian (see below)
da-DKDanish
luo-KEDholuo
nlDutchnlDutch
en-AUEnglish (Australia)
en-INEnglish (India)en-INEnglish (India)
en-GBEnglish (United Kingdom)en-GBEnglish (UK)
en-USEnglish (United States)en-USEnglish (US)
et-EEEstonian
foFaroese
fi-FIFinnish
frFrenchfrFrench
gl-ESGalician
ka-GEGeorgianka-GEGeorgian
deGermandeGerman
el-GRGreekel-GRGreek
gnGuaranignGuarani
gu-INGujarati
ht-HTHaitian Creoleht-HTHaitian Creole
haHausahaHausa
haw-USHawaiian
he-ILHebrew
hi-INHindihi-INHindi
hu-HUHungarianhu-HUHungarian
is-ISIcelandic
ig-NGIgbo
id-IDIndonesianid-IDIndonesian
ga-IEIrish
itItalianit-ITItalian
ja-JPJapaneseja-JPJapanese
jv-IDJavanese
kam-KEKamba
kn-INKannada
kk-KZKazakhkk-KZKazakh
kmKhmerkm-KHKhmer
rn-BIRundirn-BIKirundi
ko-KRKoreanko-KRKorean
kuKurdishkuKurdish
ky-KGKyrgyz
lo-LALaolo-LALao
lv-LVLatvian
lnLingala
lt-LTLithuanianlt-LTLithuanian
lg-UGLuganda
lb-LULuxembourgishlb-LULuxembourgish
mk-MKMacedonianmk-MKMacedonian
ms-MYMalay
ml-INMalayalam
mg-MGMalagasy
mt-MTMaltese
mi-NZMāori
mr-INMarathi
mn-MNMongolian
nd-ZWNdebelend-ZWNdebele (North)
nr-ZANdebele (South)
ne-NPNepali
no-NONorwegian
oc-FROccitan
or-INOdia
omOromoom-ETOromo
psPashtopsPashto
fa-IRPersian (Iran)faPersian
pl-PLPolishpl-PLPolish
ptPortugueseptPortuguese
ro-RORomanianro-RORomanian
paPunjabipaPunjabi
ru-RURussianru-RURussian
shSerbo-Croat-BosnianhbsSerbocroatian
st-ZAst-ZASesotho
snShonasnShona
si-LKsi-LKSinhala
sdsdSindhi
sl-SISloveniansl-SISlovenian
sk-SKSlovaksk-SKSlovak
soSomalisoSomali
es-XASpanish (America)es-XASpanish (American)
es-ESSpanish (Europe)es-ESSpanish (Spain)
su-IDsu-IDSundanese
swSwahiliswSwahili
ss-ZAss-ZASwazi
sv-SESwedishsv-SESwedish
tgTajik
taTamiltaTamil
ttTatar
te-INTelugute-INTelugu
th-THThaith-THThai
boTibetanboTibetan
tiTigrinyatiTigrinya
tpi-PGTok Pisintpi-PGTok Pisin
ts-ZATsonga
tn-ZATswana
tr-TRTurkishtr-TRTurkish
tkTurkmen
uk-UAUkrainianuk-UAUkrainian
umb-AOUmbundu
urUrduurUrdu
uz-UZUzbekuz-UZUzbek
ve-ZAVenda
vi-VNVietnamesevi-VNVietnamese
cy-GBWelsh
woWolof
xh-ZAXhosa
yiYiddish
yoYoruba
zuZuluzuZulu

GPU processing

Another difference is the possibility of GPU processing, which is not available on the L4 version. Processing audio on a GPU greatly improves the performance of the system. Processing the same amount of data on a CPU is much less efficient. For more information about the performance enhancement refer to the example measurements.

Accuracy

The new generation is also a step forward in terms of accuracy. Below are 3 sample measurements.

DatasetNumber of languagesL4 accuracyXL5 accuracy
NIST LRE 15130.5970.626
NIST LRE 17170.6590.713
ROXSD140.8590.953

The accuracy evaluation is done using the metric described in the Accuracy Evaluation section, where higher accuracy indicates better performance.

The accuracy can be further enhanced using subsets of languages or language groups.

Subsets of languages

Users can enhance the accuracy of language identification by configuring the system to detect only a subset of the available languages. These subsets are created based on the languages the user expects to be present within the audio pool to be analyzed. This is achieved by adjusting the prior weights so that only the relevant languages have non-zero weights, thereby excluding the others. This method minimizes confusion and increases the probability of correct identification by focusing on the expected languages.

For detailed information, refer to the Specifying Languages and Groups section. Creating and using subsets in XL5 is straightforward and user-friendly.

A similar outcome may be achieved in L4 using custom language packs, but this process was significantly more time-consuming compared to the current streamlined subset configuration in the new XL5.

Language groups

XL5 also introduces a novelty in the form of so-called language groups, which offer another way to improve accuracy. When a group is formed from a set of languages, the scores of these languages are combined and reported as a group score. One potential use case for groups is to combine dialects or regional variants of a language into a single, encapsulating group that represents the language as a whole. For more information, refer to the Specifying Languages and Groups section.