Age Estimation

Phonexia Age Estimation (AGE) technology estimates the age of a speaker from an audio recording or voiceprint.

Technology

Trained with a focus on spontaneous telephone conversations.
Language-, accent-, text-, and channel-independent.
Compatible with a wide range of audio sources (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc.

Input

Audio: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, with 8kHz or higher sampling rate.
Voiceprints: The AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself.

Output

A log file containing the processed information and age estimate.

Processing speed

Approximately 20 FTRT on a single CPU core. For example, a standard 8 CPU core server can process 3,840 hours of audio in one day of computing time.

Representation of the results

For the CMD version

Name_of_the_file.wav Age[integer - limited to 99]
example/david_1.wav 41
example/david_2.wav 40

For the SPE version

name – represents the estimated age
score – represents the score for the age [1/0]

To obtain a result, each age is assigned a score. When the score equals 1, it represents the age estimated by the system.

{
    "result": {
        "version": 2,
        "name": "AgeEstimationResult",
        "file": "/kelly_2.wav",
        "model": "L",
        "channel_scores": [
            {
                "channel": 0,
                "scores": [
                    {
                        "name": "0",
                        "score": 0
                    },
                    {
                        "name": "1",
                        "score": 0
                    },
                    ...
                    {
                        "name": "41",
                        "score": 1
                    },
                    {
                        "name": "42",
                        "score": 0
                    },
                    ...
                ]
            }
        ]
    }
}

In order to achieve the most representative results possible, a span of +/- 10 years should be added to the results.

Technology​

Input​

Output​

Processing speed​

Representation of the results​

For the CMD version​

For the SPE version​