Gender Identification
Gender Identification is a language-, domain-, and channel-independent technology that uses the acoustic characteristics of recordings to determine the gender of a speaker as either male or female.
This technology can process audio files and voiceprints, see the Input data section.
Typical use cases
- Filtering calls by the speakers' gender,
- Playing advertisements targeted to persons of a specific gender,
- Performing quick demographic analysis of recordings.
Input data
The technology supports both multi-channel audio files and Base64-encoded voiceprints. To learn more about voiceprints, refer to this section of the documentation.
- Recommended minimum speech signal for identification: 3+ seconds.
Output
- Gender Identification score: The probability that the speaker in the recording is of the given gender. Probability values can range from 0.0 to 1.0.
Performance
- Speed: Up to ~80 times faster than real-time (~1100x on GPU).
FAQ
Why is the score in the Web Application always above 50%?
Unlike the API, which provides separate percentage scores for both male and female—always adding up to 100%—the Web Application displays only the dominant gender result with a score between 50% and 100%. A score below 50% would imply the opposite gender, so it's not shown.
For example, a result of 54% male suggests low confidence in the speaker being male, and actually indicates the speaker is likely female. The closer the score is to 50%, the less certain the prediction is.
Why is the score of the more probable gender still very low?
Low scores can result from several factors: the voice is either difficult to classify as either male or female or the audio channel contains more than one speaker of different genders. For optimal results, ensure there is only one speaker per channel.
What can I do to improve processing speed?
If you're using media files as input, the system first extracts a voiceprint, which is the most time-consuming part of the process. To speed things up, we recommend running the technology on a GPU if possible.
Additionally, to improve efficiency, once you processed your recordings, you can store the extracted voiceprints and reuse them for any future processing instead of reprocessing the original audio. This can significantly speed up the workflow without impacting accuracy.