Gender Identification
This guide demonstrates how to perform Gender Identification with Phonexia Speech Platform 4. You can find a high-level description in the About Gender Identification article. The technology can identify gender in audio files or in voiceprints. This guide will show you how to do both.
For testing, we'll be using the following 17 recordings in various languages. You can download them all together in the audio_files.zip archive.
filename | gender | filename | gender | filename | gender |
---|---|---|---|---|---|
Adedewe.wav | male | Lenka.wav | female | Tatiana.wav | female |
Dina.wav | female | Lubica.wav | female | Thida.wav | female |
Fadimatu.wav | female | Luka.wav | male | Tuan.wav | male |
Harry.wav | male | Nirav.wav | male | Xiang.wav | male |
Juan.wav | male | Noam.wav | male | Zoltan.wav | male |
Julia.wav | female | Obioma.wav | female |
At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to integrate Gender Identification in your own projects.
Prerequisites
In the guide, we assume that the Virtual Appliance is running on port 8000
of
http://localhost
. For more information on how to install and start the Virtual
Appliance, please refer to the
Virtual Appliance Installation guide.
The technology requires a proper model and license in order to process any
files. For more details on models and licenses see the
Licensing
section.
Environment Setup
We are using Python 3.9
and Python library requests 2.27
in this example.
You can install the requests
library with pip
as follows:
pip install requests~=2.27
Basic Gender Identification from file
To run Gender Identification for a single audio file, you should start by
sending a POST
request to the
/api/technology/gender-identification
endpoint. In Python, you can do this as follows:
import os
import requests
SPEECH_PLATFORM_SERVER = "http://localhost:8000"
ENDPOINT_URL = os.path.join(SPEECH_PLATFORM_SERVER, "api/technology/gender-identification")
audio_path = "Adedewe.wav"
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
print(f"{response.status_code=}") # Should print 'response.status_code=202'
If the task has been successfully accepted, the 202
code will be returned
together with a unique task ID
in the response
body. The task isn't
processed immediately, but only scheduled for processing. You can check the
current task status by polling for the result.
The URL for polling the result is returned in the X-Location
header.
Alternatively, you can assemble the polling URL on your own by appending a slash
(/
) and the task ID
to the endpoint URL.
import time
polling_url = response.headers["x-location"] # Use the `response` from the previous step
# Alternatively:
# import os
# polling_url = os.path.join(ENDPOINT_URL, response.json()["task"]["task_id"])
while True:
response = requests.get(polling_url)
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(5)
print(f"{data=}")
Once the polling finishes, data
will contain the latest response from the
server -- either the result of Gender Identification, or an error message with
details, in case processing was not able to finish properly.
The result contains information about individual input audio channels which can
be identified by their channel_number
. The speech_length
field shows how
much speech was used for producing the probabilities which are shown separately
for male
and female
gender in the scores
object.
The following JSON shows the result of a successful Gender Identification task
for the Adedewe.wav
file which shows that the gender was correctly identified
as male with probability = 0.99434
.
{
"task": {
"task_id": "db414429-2b56-46b2-bf53-51e2a813b6da",
"state": "done"
},
"result": {
"channels": [
{
"channel_number": 0,
"speech_length": 30.88,
"scores": {
"male": {
"probability": 0.99434
},
"female": {
"probability": 0.00566
}
}
}
]
}
}
Gender Identification from voiceprints
Gender identification can be performed on voiceprints extracted from audio files with the Voiceprint Extraction technology.
For testing, we'll be using voicepints extracted from the 17 testing recordings, that you can find in the voiceprints.zip archive.
SPEECH_PLATFORM_SERVER = "http://localhost:8000"
ENDPOINT_URL = os.path.join(SPEECH_PLATFORM_SERVER, "api/technology/gender-identification-voiceprints")
voiceprint_paths = [
"Adedewe.vp",
"Dina.vp",
"Fadimatu.vp",
"Harry.vp",
"Juan.vp",
"Julia.vp",
"Lenka.vp",
"Lubica.vp",
"Luka.vp",
"Nirav.vp",
"Noam.vp",
"Obioma.vp",
"Tatiana.vp",
"Thida.vp",
"Tuan.vp",
"Xiang.vp",
"Zoltan.vp",
]
voiceprints = []
for path in voiceprint_paths:
with open(path) as file:
voiceprints.append(file.read())
response = requests.post(
url=ENDPOINT_URL,
json={"voiceprints": voiceprints},
print(f"{response.status_code=}") # Should print 'response.status_code=202'
)
After polling for the result as in the previous example, we'll get the following
output (shortened here). The voiceprint_scores
come in the same order as the
input voiceprints
. Notice that the result for Adedewe.vp
is exactly the same
as for Adedewe.wav
above.
{
"task": {
"task_id": "26776c99-ac92-4df0-a0f3-3beb1498f4ee",
"state": "done"
},
"result": {
"voiceprint_scores": [
{
"speech_length": 30.88,
"scores": {
"male": {
"probability": 0.99434
},
"female": {
"probability": 0.00566
}
}
},
{
"speech_length": 19.52,
"scores": {
"male": {
"probability": 0.00115
},
"female": {
"probability": 0.99885
}
}
},
{
"speech_length": 23.84,
"scores": {
"male": {
"probability": 0.1778
},
"female": {
"probability": 0.8222
}
}
},
...
]
}
}
Full Python Code
Here is the full example on how to run the Gender Identification technology with both files and voiceprints as input data. The code is slightly adjusted and wrapped into functions.
The scores_from_file.json and
scores_from_voiceprints.json files contain the
results of the Gender Identification. Notice that the results are identical
except for the filename extensions (wav
vs vp
) and the extra
channel_number
information in scores_from_file.json
.
import os
import json
import requests
import time
SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
def poll_result(polling_url: str, sleep: int = 5):
while True:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(sleep)
return response
# Gender Identification from files
def run_gender_identification_from_file(audio_path: str):
print(f"Running Gender Identification for file {audio_path}.")
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=os.path.join(SPEECH_PLATFORM_SERVER, "api/technology/gender-identification"),
files=files,
)
response.raise_for_status()
polling_url = response.headers["x-location"]
gender_identification_response = poll_result(polling_url)
return gender_identification_response.json()
filenames = [
"Adedewe.wav",
"Dina.wav",
"Fadimatu.wav",
"Harry.wav",
"Juan.wav",
"Julia.wav",
"Lenka.wav",
"Lubica.wav",
"Luka.wav",
"Nirav.wav",
"Noam.wav",
"Obioma.wav",
"Tatiana.wav",
"Thida.wav",
"Tuan.wav",
"Xiang.wav",
"Zoltan.wav",
]
results = {}
for filename in filenames:
data = run_gender_identification_from_file(filename)
# The files are mono recordings, so we access the result in the first channel (index 0).
result = data["result"]["channels"][0]
results[filename] = result
print(f" The result for {filename} is: {result}")
with open("scores_from_file.json", "w") as output:
json.dump(results, output, indent=2)
# Gender Identification from voiceprints
def run_gender_identification_from_voiceprints(voiceprints: list[str]):
print(f"Running Gender Identification for {len(voiceprints)} voiceprints.")
response = requests.post(
url=os.path.join(SPEECH_PLATFORM_SERVER, "api/technology/gender-identification-voiceprints"),
json={"voiceprints": voiceprints},
)
response.raise_for_status()
polling_url = response.headers["x-location"]
gender_identification_response = poll_result(polling_url)
return gender_identification_response.json()
voiceprint_paths = [
"Adedewe.vp",
"Dina.vp",
"Fadimatu.vp",
"Harry.vp",
"Juan.vp",
"Julia.vp",
"Lenka.vp",
"Lubica.vp",
"Luka.vp",
"Nirav.vp",
"Noam.vp",
"Obioma.vp",
"Tatiana.vp",
"Thida.vp",
"Tuan.vp",
"Xiang.vp",
"Zoltan.vp",
]
voiceprints = []
for path in voiceprint_paths:
with open(path) as file:
voiceprints.append(file.read())
data = run_gender_identification_from_voiceprints(voiceprints)
result_from_voiceprints = data["result"]["voiceprint_scores"]
# Map the results to input voiceprint names.
results_per_voiceprint = {
filename: result
for filename, result in zip(voiceprint_paths, result_from_voiceprints)
}
print(f" The results are: {result_from_voiceprints}")
with open("scores_from_voiceprints.json", "w") as output:
json.dump(results_per_voiceprint, output, indent=2)