Skip to main content

Age Estimation

Phonexia Age Estimation (AGE) technology estimates the age of a speaker from an audio recording or voiceprint.

Technology

  • Trained with a focus on spontaneous telephone conversations.
  • Language-, accent-, text-, and channel-independent.
  • Compatible with a wide range of audio sources (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc.

Input

  • Audio: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, with 8kHz or higher sampling rate.
  • Voiceprints: The AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself.

Output

  • A log file containing the processed information and age estimate.

Processing speed

Approximately 20 FTRT on a single CPU core. For example, a standard 8 CPU core server can process 3,840 hours of audio in one day of computing time.

Representation of the results

For the CMD version

Name_of_the_file.wav Age[integer - limited to 99]
example/david_1.wav 41
example/david_2.wav 40

For the SPE version

  • name – represents the estimated age
  • score – represents the score for the age [1/0]

To obtain a result, each age is assigned a score. When the score equals 1, it represents the age estimated by the system.

{
"result": {
"version": 2,
"name": "AgeEstimationResult",
"file": "/kelly_2.wav",
"model": "L",
"channel_scores": [
{
"channel": 0,
"scores": [
{
"name": "0",
"score": 0
},
{
"name": "1",
"score": 0
},
...
{
"name": "41",
"score": 1
},
{
"name": "42",
"score": 0
},
...
]
}
]
}
}

In order to achieve the most representative results possible, a span of +/- 10 years should be added to the results.