Age Estimation
Phonexia Age Estimation (AGE) technology estimates the age of a speaker from an audio recording or voiceprint.
Technology
- Trained with a focus on spontaneous telephone conversations.
- Language-, accent-, text-, and channel-independent.
- Compatible with a wide range of audio sources (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc.
Input
- Audio: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, with 8kHz or higher sampling rate.
- Voiceprints: The AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself.
Output
- A log file containing the processed information and age estimate.
Processing speed
Approximately 20 FTRT on a single CPU core. For example, a standard 8 CPU core server can process 3,840 hours of audio in one day of computing time.
Representation of the results
For the CMD version
Name_of_the_file.wav Age[integer - limited to 99]
example/david_1.wav 41
example/david_2.wav 40
For the SPE version
- name – represents the estimated age
- score – represents the score for the age [1/0]
To obtain a result, each age is assigned a score. When the score equals 1, it represents the age estimated by the system.
{
"result": {
"version": 2,
"name": "AgeEstimationResult",
"file": "/kelly_2.wav",
"model": "L",
"channel_scores": [
{
"channel": 0,
"scores": [
{
"name": "0",
"score": 0
},
{
"name": "1",
"score": 0
},
...
{
"name": "41",
"score": 1
},
{
"name": "42",
"score": 0
},
...
]
}
]
}
}
In order to achieve the most representative results possible, a span of +/- 10 years should be added to the results.