Skip to main content
Version: 4.0.0-rc1

Gender Identification

This guide demonstrates how to perform Gender Identification with Phonexia Speech Platform 4. You can find a high-level description in the About Gender Identification article. The technology can identify gender in audio files or in voiceprints. This guide will show you how to do both.

For testing, we'll be using the following 17 recordings in various languages. You can download them all together in the audio_files.zip archive.

filenamegenderfilenamegenderfilenamegender
Adedewe.wavmaleLenka.wavfemaleTatiana.wavfemale
Dina.wavfemaleLubica.wavfemaleThida.wavfemale
Fadimatu.wavfemaleLuka.wavmaleTuan.wavmale
Harry.wavmaleNirav.wavmaleXiang.wavmale
Juan.wavmaleNoam.wavmaleZoltan.wavmale
Julia.wavfemaleObioma.wavfemale

At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to integrate Gender Identification in your own projects.

Prerequisites

In the guide, we assume that the Virtual Appliance is running on port 8000 of http://localhost and contains a proper model and license for the technology. For more information on how to install and start the Virtual Appliance, please refer to the Virtual Appliance Installation chapter.

Environment Setup

We are using Python 3.9 and Python library requests 2.27 in this example. You can install the requests library with pip as follows:

pip install requests~=2.27

Basic Gender Identification from file

To run Gender Identification for a single audio file, you should start by sending a POST request to the /api/technology/gender-identification endpoint. In Python, you can do this as follows:

import requests

SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/gender-identification"

audio_path = "Adedewe.wav"

with open(audio_path, mode="rb") as f:
files = {"file": f}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
print(f"{response.status_code=}") # Should print 'response.status_code=202'

If the task has been successfully accepted, the 202 code will be returned together with a unique task ID in the response body. The task isn't processed immediately, but only scheduled for processing. You can check the current task status by polling for the result.

The URL for polling the result is returned in the Location header. Alternatively, you can assemble the polling URL on your own by appending a slash (/) and the task ID to the endpoint URL.

import time

polling_url = response.headers["Location"] # Use the `response` from the previous step

# Alternatively:
# polling_url = ENDPOINT_URL + "/" + response.json()["task"]["task_id"]

while True:
response = requests.get(polling_url)
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(5)
print(f"{data=}")

Once the polling finishes, data will contain the latest response from the server -- either the result of Gender Identification, or an error message with details, in case processing was not able to finish properly.

The result contains information about individual input audio channels which can be identified by their channel_number. The speech_length field shows how much speech was used for producing the probabilities which are shown separately for male and female gender in the scores object.

The following JSON shows the result of a successful Gender Identification task for the Adedewe.wav file which shows that the gender was correctly identified as male with probability = 0.99434.

{
"task": {
"task_id": "db414429-2b56-46b2-bf53-51e2a813b6da",
"state": "done"
},
"result": {
"channels": [
{
"channel_number": 0,
"speech_length": 30.88,
"scores": {
"male": {
"probability": 0.99434
},
"female": {
"probability": 0.00566
}
}
}
]
}
}

Gender Identification from voiceprints

Gender identification can be performed on voiceprints extracted from audio files with the Voiceprint Extraction technology.

For testing, we'll be using voiceprints extracted from the 17 test recordings that you can find in the voiceprints.zip archive.

To run Gender Identification for a set of voiceprints, you should start by sending a POST request to the /api/technology/gender-identification-voiceprints endpoint. In Python, you can do this as follows:

SPEECH_PLATFORM_SERVER = "http://localhost:8000"  # Replace with your actual server URL
VOICEPRINTS_ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/gender-identification-voiceprints"

voiceprint_paths = [
"Adedewe.vp",
"Dina.vp",
"Fadimatu.vp",
"Harry.vp",
"Juan.vp",
"Julia.vp",
"Lenka.vp",
"Lubica.vp",
"Luka.vp",
"Nirav.vp",
"Noam.vp",
"Obioma.vp",
"Tatiana.vp",
"Thida.vp",
"Tuan.vp",
"Xiang.vp",
"Zoltan.vp",
]

voiceprints = []
for path in voiceprint_paths:
with open(path) as f:
voiceprints.append(f.read())

response = requests.post(
url=VOICEPRINTS_ENDPOINT_URL,
json={"voiceprints": voiceprints},

print(f"{response.status_code=}") # Should print 'response.status_code=202'

After polling for the result as in the previous example, we'll get the following output (shortened here). The voiceprint_scores come in the same order as the input voiceprints. Notice that the result for Adedewe.vp is exactly the same as when estimated from audio file.

{
"task": {
"task_id": "26776c99-ac92-4df0-a0f3-3beb1498f4ee",
"state": "done"
},
"result": {
"voiceprint_scores": [
{
"speech_length": 30.88,
"scores": {
"male": {
"probability": 0.99434
},
"female": {
"probability": 0.00566
}
}
},
{
"speech_length": 19.52,
"scores": {
"male": {
"probability": 0.00115
},
"female": {
"probability": 0.99885
}
}
},
{
"speech_length": 23.84,
"scores": {
"male": {
"probability": 0.1778
},
"female": {
"probability": 0.8222
}
}
},
...
]
}
}

Full Python Code

Here is the full example on how to run the Gender Identification technology with both files and voiceprints as input data. The code is slightly adjusted and wrapped into functions.

The scores_from_file.json and scores_from_voiceprints.json files contain the results of the Gender Identification. Notice that the results are identical except for the filename extensions (wav vs vp) and the extra channel_number information in scores_from_file.json.

import json
import requests
import time

SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/gender-identification"
VOICEPRINTS_ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/gender-identification-voiceprints"

def poll_result(polling_url: str, sleep: int = 5):
while True:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(sleep)
return response


def run_gender_identification_from_file(audio_path: str):
with open(audio_path, mode="rb") as f:
response = requests.post(
url=ENDPOINT_URL,
files={"file": f},
)
response.raise_for_status()

polling_url = response.headers["Location"]
gender_identification_response = poll_result(polling_url)
return gender_identification_response.json()


def run_gender_identification_from_voiceprints(voiceprint_paths: list[str]):
voiceprints = []
for path in voiceprint_paths:
with open(path) as f:
voiceprints.append(f.read())

response = requests.post(
url=VOICEPRINTS_ENDPOINT_URL,
json={"voiceprints": voiceprints},
)
response.raise_for_status()

polling_url = response.headers["Location"]
gender_identification_response = poll_result(polling_url)
return gender_identification_response.json()


# Run Gender Identification from audio files
filenames = [
"Adedewe.wav",
"Dina.wav",
"Fadimatu.wav",
"Harry.wav",
"Juan.wav",
"Julia.wav",
"Lenka.wav",
"Lubica.wav",
"Luka.wav",
"Nirav.wav",
"Noam.wav",
"Obioma.wav",
"Tatiana.wav",
"Thida.wav",
"Tuan.wav",
"Xiang.wav",
"Zoltan.wav",
]

results = {}
for filename in filenames:
print(f"Running Gender Identification for file {filename}.")
data = run_gender_identification_from_file(filename)
# The files are mono recordings, so we access the result in the first channel (index 0).
result = data["result"]["channels"][0]
results[filename] = result
print(f"The result for {filename} is: {result}")

# Save the results to a file.
with open("scores_from_file.json", "w") as output:
json.dump(results, output, indent=2)


# Run Gender Identification from voiceprints
voiceprint_paths = [
"Adedewe.vp",
"Dina.vp",
"Fadimatu.vp",
"Harry.vp",
"Juan.vp",
"Julia.vp",
"Lenka.vp",
"Lubica.vp",
"Luka.vp",
"Nirav.vp",
"Noam.vp",
"Obioma.vp",
"Tatiana.vp",
"Thida.vp",
"Tuan.vp",
"Xiang.vp",
"Zoltan.vp",
]

print(f"Running Gender Identification for {len(voiceprint_paths)} voiceprints.")
data = run_gender_identification_from_voiceprints(voiceprint_paths)
result_from_voiceprints = data["result"]["voiceprint_scores"]
# Map the results to input voiceprint names.
results_per_voiceprint = {
filename: result
for filename, result in zip(voiceprint_paths, result_from_voiceprints)
}

print(f"The results are: {result_from_voiceprints}")

# Save the results to a file.
with open("scores_from_voiceprints.json", "w") as output:
json.dump(results_per_voiceprint, output, indent=2)