Version: 3.3.0

Voiceprint Extraction Speech Length

This feature allows users to specify the amount of speech for voiceprint extraction in the Voiceprint Extraction microservice (see gRPC API). This can be useful to speed up processing for long recordings.

A high quality voiceprint typically requires around twenty seconds of speech (this depends on audio quality and the extraction model). If the recording has more than this amount of speech, the extraction will take more time to finish, but without any noticeable benefit in accuracy. However, if the specified speech length is too short, it will lead to a significant decrease in accuracy.

Note

Using the Speech Length feature allows for speeding up the processing time by bypassing certain parts of the algorithm. However some processing, like computing speech segments, still needs to be done on the entire input recording. For this reason, the full length of the input audio is charged.

Measurement

The following table shows accuracy and speedup measurements on a sample dataset for different speech length values. The average length of the input audio files is 120 seconds. As you can see from this dataset, we can achieve almost a 2X speedup with no loss of accuracy, and almost a 3X speedup with only minor increase in Equal Error Rate.

speech_length	EER%	speedup
unlimited	1.65	1.0x
30 sec	1.65	1.95x
20 sec	1.88	2.48x
10 sec	1.88	2.84x

Note

These speech_length values are not universal for all datasets, so this table is meant rather for illustration purposes only.

Measurement​

Measurement