Version: 2026.03.0-rc1

Vector Voiceprints

This guide demonstrates how to work with vector voiceprints in the Phonexia Speech Platform 4 Virtual Appliance.

Voiceprints represent unique speaker characteristics contained in individual media files and can be used for Speaker Identification, Gender Identification, and Age Estimation.

Vector voiceprints are a form of voiceprint reduced to just its numerical representation. They can be used in vector databases to enable some powerful use cases. These include large-scale speaker search (millions of comparisons within milliseconds), speaker clustering, and real-time speaker verification. You can learn how to use vector voiceprints in a vector database in a dedicated article.

In the present guide, you'll learn, step by step, how to create vector voiceprints using Voiceprint Extraction and how to convert existing standard voiceprints to vector voiceprints. At the end, you'll find the full Python code example that combines all the steps together.

Model version

Note that all example results were acquired by a specific version of technology model and may change in future releases:

Speaker Identification: xl-5.4.0

Prerequisites

Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.

Create vector voiceprints using Voiceprint Extraction

Vector voiceprints can be created by running Voiceprint Extraction with the include_vector_voiceprint query parameter set to true. Start by sending a POST request to the /api/technology/speaker-identification-voiceprint-extraction endpoint. You can pass the john_doe.wav example audio as the mandatory file body parameter. In Python, you can do this as follows:

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-extraction"

media_file = "john_doe.wav"

with open(media_file, mode="rb") as file:
    files = {"file": file}
    start_task_response = requests.post(
        url=MEDIA_FILE_BASED_ENDPOINT_URL,
        files=files,
        params={"include_vector_voiceprint": True},
    )
print(start_task_response.status_code)  # Should print '202'

If the task was successfully accepted, 202 code will be returned together with a unique task ID in the response body. The task isn't immediately processed, but only scheduled for processing. You can check the current task status by polling for the result.

Polling

To obtain the final result, periodically query the task status until the task state changes to done, failed or rejected. The general polling procedure is described in detail in the Task lifecycle code examples.

Result for Voiceprint Extraction

The result field of the task contains the channels list of independent results for each channel, identified by its channel_number. Each channel contains:

voiceprint: A Base64-encoded string of the extracted voiceprint (including metadata).
vector_voiceprint: A vector of floats representing the speaker's unique voice characteristics.
speech_length: Length of the speech in seconds used for extraction.
model: A string representing the model used for extraction.

Example task result of a successful Voiceprint Extraction with include_vector_voiceprint=true (shortened for readability):

{
  "task": {
    "task_id": "fb9de4e5-a768-4069-aff3-c74c826f3ddf",
    "state": "done"
  },
  "result": {
    "channels": [
      {
        "channel_number": 0,
        "voiceprint": "e2kDY3JjbDAWiyhpCWVtYmVkZGluZ1tkO/QWvmS8JkuGZDyv+F5kvJQzJ...",
        "vector_voiceprint": [
          0.1378096193,
          0.0822454691,
          0.0322346129,
          -0.0448936746,
          -0.0743807331,
          ...
        ],
        "speech_length": 49.08,
        "model": "sid-xl5"
      }
    ]
  }
}

Convert standard voiceprints to vector voiceprints

If you have standard voiceprints from previous runs of Voiceprint Extraction but you are missing vector voiceprints for those media files, you can create them by converting standard voiceprints to vector voiceprints. Compared to full voiceprint extraction, this is an extremely fast operation.

To run Voiceprint Conversion for a list of voiceprints, start by sending a POST request to the /api/technology/speaker-identification-voiceprint-conversion endpoint. The list of voiceprints is the only mandatory field in the request body. In Python, you can do this as follows (assuming media_file_based_task contains the result of the run_media_based_task function from the Task lifecycle code examples):

import time

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-conversion"

# See the Task Lifecycle Code Examples article on how to get `media_file_based_task`
voiceprint = media_file_based_task["result"]["channels"][0]["voiceprint"]

body = {"voiceprints": [voiceprint]}

start_task_response = requests.post(
    url=VOICEPRINT_BASED_ENDPOINT_URL,
    json=body,
)
print(start_task_response.status_code)  # Should print '202'

Result for Voiceprint Conversion

You can then poll for the conversion response in the same fashion as in the case of Voiceprint Extraction. On success, the response will look like this (shortened for readability):

{
  "task": {
    "task_id": "eb1490ac-8d78-4021-a728-d8fa51a1e82f",
    "state": "done"
  },
  "result": {
    "vector_voiceprints": [
      [
        0.1378096193,
        0.0822454691,
        0.0322346129,
        -0.0448936746,
        -0.0743807331,
        ...
      ]
    ]
  }
}

Notice that the values in the vector voiceprint are identical to those in the result of Voiceprint Extraction with include_vector_voiceprint=true.

Full Python code

Here is the full code for this example, slightly adjusted and wrapped into functions for better readability. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.

import json
import time

import requests


VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-extraction"
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/speaker-identification-voiceprint-conversion"


def poll_result(polling_url, polling_interval=5):
    """Poll the task endpoint until processing completes."""
    count = 0
    while True:
        count += 1
        polling_task_response = requests.get(polling_url)
        polling_task_response.raise_for_status()
        polling_task_response_json = polling_task_response.json()
        task_state = polling_task_response_json["task"]["state"]
        if task_state in {"done", "failed", "rejected"}:
            break
        time.sleep(polling_interval)
    return polling_task_response


def run_media_based_task(media_file, params):
    """Create a media-based task and wait for results."""
    print(f"Running voiceprint extraction for file {media_file}.")
    with open(media_file, mode="rb") as file:
        files = {"file": file}
        start_task_response = requests.post(
            url=MEDIA_FILE_BASED_ENDPOINT_URL,
            files=files,
            params=params,
        )
        start_task_response.raise_for_status()
    polling_url = start_task_response.headers["Location"]
    task_result = poll_result(polling_url)
    return task_result.json()


def run_voiceprint_based_task(json_payload):
    """Create a voiceprint-based task and wait for results."""
    print("Running voiceprint conversion.")
    start_task_response = requests.post(
        url=VOICEPRINT_BASED_ENDPOINT_URL,
        json=json_payload,
    )
    start_task_response.raise_for_status()
    polling_url = start_task_response.headers["Location"]
    task_result = poll_result(polling_url)
    return task_result.json()


audio_path = "john_doe.wav"

# Extract both standard and vector voiceprint
media_file_based_task = run_media_based_task(
    audio_path, params={"include_vector_voiceprint": True}
)
media_file_based_task_result = media_file_based_task["result"]
print(json.dumps(media_file_based_task_result, indent=2))

# Convert standard voiceprint to vector voiceprint
voiceprint = media_file_based_task_result["channels"][0]["voiceprint"]
voiceprint_based_task = run_voiceprint_based_task({"voiceprints": [voiceprint]})
voiceprint_based_task_result = voiceprint_based_task["result"]
print(json.dumps(voiceprint_based_task_result, indent=2))

# Check that both vector voiceprints are identical
vector_from_extraction = media_file_based_task_result["channels"][0][
    "vector_voiceprint"
]
vector_from_conversion = voiceprint_based_task_result["vector_voiceprints"][0]
assert vector_from_extraction == vector_from_conversion

Prerequisites​

Create vector voiceprints using Voiceprint Extraction​

Polling​

Result for Voiceprint Extraction​

Convert standard voiceprints to vector voiceprints​

Result for Voiceprint Conversion​

Full Python code​