Age Estimation
Phonexia Age Estimation technology estimates the age of a speaker from an audio recording or voiceprint.
The technology is trained with a focus on:
- Spontaneous telephone conversation.
- Language-, accent-, text-, and channel-independency.
- Compatibility with a wide range of audio sources (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc.
Typical use cases
- Filtering calls by the speakers' age,
- Playing advertisements targeted to persons of a specific age group,
- Performing quick demographic analysis of recordings.
Input data
The technology supports both multi-channel audio files and Base64-encoded voiceprints. To learn more about voiceprints, refer to this section of the documentation.
When estimating the age of a speaker from audio data, the minimum required length of speech signal in the audio is 0.01 seconds. However for a reliable estimation, the recommended minimum speech signal for estimation is around 3+ seconds.
In case of estimation from audio, first a voiceprint needs to be extracted. This process is called voiceprint extraction and is the most time-consuming part of the estimation.
If voiceprints are already extracted (usually in case of combination with more technologies), these voiceprints can be used directly for age estimation for dramatically improved performance.
Output
The output of the age estimation technology is an integer between 0 and 100 representing the estimated age of the speaker.
The accuracy of the estimation depends on the quality of the input audio, but generally speaking, it is in a range of +/-10 years. Therefore in order to achieve the most representative results possible, this span of +/-10 years should be added to the results by the user.
FAQ
What can I do to improve processing speed?
If you're using media files as input, the system first extracts a voiceprint, which is the most time-consuming part of the process. To speed things up, we recommend running the technology on a GPU if possible.
Additionally, to improve efficiency, once you processed your recordings, you can store the extracted voiceprints and reuse them for any future processing instead of reprocessing the original audio. This can significantly speed up the workflow without impacting accuracy.