Voiceprint Merging
This guide demonstrates how to use Phonexia Speech Platform 4 Virtual Appliance to merge voiceprints for technologies Speaker Identification, Gender Identification, and Age Estimation.
Voiceprints represent unique speaker characteristics contained in individual media files. Voiceprints from multiple files with the same speaker can be merged to improve the system's accuracy and robustness.
You can learn how to create voiceprints in the Speaker Verification guide.
In the voiceprints.zip
archive, you will find example voiceprints used in this guide. Harry1.vp,
Harry2.vp, Harry3.vp, and Harry1+2.vp which will be used as examples
throughout the guide.
At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to perform Voiceprint Merging in your own projects.
Prerequisites
Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.
Run Voiceprint Merging
To run Voiceprint Merging for a list of voiceprints, you should start by sending
a POST request to the
/api/technology/speaker-identification-voiceprint-merging
endpoint. The request body must contain a list of at least two voiceprints. In
Python, you can do this as follows:
import requests
VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-merging"
voiceprint_files = [
"Harry1.vp",
"Harry2.vp",
]
voiceprints = []
for voiceprint_file in voiceprint_files:
with open(voiceprint_file) as f:
voiceprints.append(f.read())
start_task_response = requests.post(
url=VOICEPRINT_BASED_ENDPOINT_URL,
json={"voiceprints": voiceprints},
)
print(start_task_response.status_code) # Should print '202'
If the task was successfully accepted, 202 code will be returned together with a
unique task ID in the response body. The task isn't immediately processed, but
only scheduled for processing. You can check the current task status whilst
polling for the result.
Polling
To obtain the final result, periodically query the task status until the task
state changes to done, failed or rejected. The general polling procedure
is described in detail in the
Task lifecycle code examples.
Result for Voiceprint Merging
The result of the task contains the following fields:
voiceprint: A Base64-encoded string of the merged voiceprint.speech_length: The sum of all input voiceprints' speech lengths in seconds.
Example task result of a successful Voiceprint Merging:
{
"task": {
"task_id": "f47ed5ca-9cc9-420c-9964-1b5d219e07b5",
"state": "done"
},
"result": {
"voiceprint": "eyNpBWkDY3JjTAAAAACnrXW2aQllbWJlZGRpbmdbJGQjSQIAvGyCAbsfO9...",
"speech_length": 101.76
}
}
Voiceprint Comparison with the merged voiceprint
When you run
Voiceprint Comparison
of the merged voiceprint, Harry1+2.vp, and a third voiceprint of the same
speaker, Harry3.vp, you will notice that the comparison score is higher than
when the voiceprint is compared to the original voiceprints Harry1.vp and
Harry2.vp individually, meaning that the merging has helped to make the
comparison
more accurate.
| Voiceprint A | Voiceprint B | Score |
|---|---|---|
| Harry3.vp | Harry1.vp | 5.91 |
| Harry3.vp | Harry2.vp | 4.21 |
| Harry3.vp | Harry1+2.vp | 6.31 |
Full Python code
Here is the full code for this example, slightly adjusted and wrapped into functions for better readability. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.
import requests
import time
VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-merging"
def poll_result(polling_url, polling_interval=5):
"""Poll the task endpoint until processing completes."""
while True:
polling_task_response = requests.get(polling_url)
polling_task_response.raise_for_status()
polling_task_response_json = polling_task_response.json()
task_state = polling_task_response_json["task"]["state"]
if task_state in {"done", "failed", "rejected"}:
break
time.sleep(polling_interval)
return polling_task_response
def run_voiceprint_based_task(json_payload):
"""Create a voiceprint-based task and wait for results."""
start_task_response = requests.post(
url=VOICEPRINT_BASED_ENDPOINT_URL,
json=json_payload,
)
start_task_response.raise_for_status()
polling_url = start_task_response.headers["Location"]
task_result = poll_result(polling_url)
return task_result.json()
voiceprint_files = [
"Harry1.vp",
"Harry2.vp",
]
voiceprints = []
for voiceprint_file in voiceprint_files:
with open(voiceprint_file) as f:
voiceprints.append(f.read())
# Merge voiceprints
voiceprint_merging_response = run_voiceprint_based_task(
json_payload={
"voiceprints": voiceprints,
}
)
print(voiceprint_merging_response)