Skip to main content
Version: 2026.03.0-rc1

Voiceprint Merging

This guide demonstrates how to use Phonexia Speech Platform 4 Virtual Appliance to merge voiceprints for technologies Speaker Identification, Gender Identification, and Age Estimation.

Voiceprints represent unique speaker characteristics contained in individual media files. Voiceprints from multiple files with the same speaker can be merged to improve the system's accuracy and robustness.

You can learn how to create voiceprints in the Speaker Verification guide.

In the voiceprints.zip archive, you will find example voiceprints used in this guide. Harry1.vp, Harry2.vp, Harry3.vp, and Harry1+2.vp which will be used as examples throughout the guide.

At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to perform Voiceprint Merging in your own projects.

Prerequisites

Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.

Run Voiceprint Merging

To run Voiceprint Merging for a list of voiceprints, you should start by sending a POST request to the /api/technology/speaker-identification-voiceprint-merging endpoint. The request body must contain a list of at least two voiceprints. In Python, you can do this as follows:

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-merging"

voiceprint_files = [
"Harry1.vp",
"Harry2.vp",
]

voiceprints = []
for voiceprint_file in voiceprint_files:
with open(voiceprint_file) as f:
voiceprints.append(f.read())

start_task_response = requests.post(
url=VOICEPRINT_BASED_ENDPOINT_URL,
json={"voiceprints": voiceprints},
)
print(start_task_response.status_code) # Should print '202'

If the task was successfully accepted, 202 code will be returned together with a unique task ID in the response body. The task isn't immediately processed, but only scheduled for processing. You can check the current task status whilst polling for the result.

Polling

To obtain the final result, periodically query the task status until the task state changes to done, failed or rejected. The general polling procedure is described in detail in the Task lifecycle code examples.

Result for Voiceprint Merging

The result of the task contains the following fields:

  • voiceprint: A Base64-encoded string of the merged voiceprint.
  • speech_length: The sum of all input voiceprints' speech lengths in seconds.

Example task result of a successful Voiceprint Merging:

{
"task": {
"task_id": "f47ed5ca-9cc9-420c-9964-1b5d219e07b5",
"state": "done"
},
"result": {
"voiceprint": "eyNpBWkDY3JjTAAAAACnrXW2aQllbWJlZGRpbmdbJGQjSQIAvGyCAbsfO9...",
"speech_length": 101.76
}
}

Voiceprint Comparison with the merged voiceprint

When you run Voiceprint Comparison of the merged voiceprint, Harry1+2.vp, and a third voiceprint of the same speaker, Harry3.vp, you will notice that the comparison score is higher than when the voiceprint is compared to the original voiceprints Harry1.vp and Harry2.vp individually, meaning that the merging has helped to make the comparison more accurate.

Voiceprint AVoiceprint BScore
Harry3.vpHarry1.vp5.91
Harry3.vpHarry2.vp4.21
Harry3.vpHarry1+2.vp6.31

Full Python code

Here is the full code for this example, slightly adjusted and wrapped into functions for better readability. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.

import requests
import time

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address

VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-merging"


def poll_result(polling_url, polling_interval=5):
"""Poll the task endpoint until processing completes."""
while True:
polling_task_response = requests.get(polling_url)
polling_task_response.raise_for_status()
polling_task_response_json = polling_task_response.json()
task_state = polling_task_response_json["task"]["state"]
if task_state in {"done", "failed", "rejected"}:
break
time.sleep(polling_interval)
return polling_task_response


def run_voiceprint_based_task(json_payload):
"""Create a voiceprint-based task and wait for results."""
start_task_response = requests.post(
url=VOICEPRINT_BASED_ENDPOINT_URL,
json=json_payload,
)
start_task_response.raise_for_status()
polling_url = start_task_response.headers["Location"]
task_result = poll_result(polling_url)
return task_result.json()


voiceprint_files = [
"Harry1.vp",
"Harry2.vp",
]

voiceprints = []
for voiceprint_file in voiceprint_files:
with open(voiceprint_file) as f:
voiceprints.append(f.read())

# Merge voiceprints
voiceprint_merging_response = run_voiceprint_based_task(
json_payload={
"voiceprints": voiceprints,
}
)
print(voiceprint_merging_response)