Skip to main content
Version: 4.0.0

Authenticity Verification

Experimental feature

Please note that Authenticity Verification is an experimental feature.
It is under development and may change in the future.

This guide demonstrates how to perform Authenticity Verification with Phonexia Speech Platform 4.

Authenticity Verification technology enables you to determine if audio data is genuine. The technology is designed to offer various operations for verifying audio authenticity.

We encourage you to read the documentation for each operation to learn more about their features and capabilities.

Model versions

Note that all example results were acquired by the specific model versions and may change in future releases.

  • Deepfake Detection: beta:2.2.0
  • Replay Attack Detection: beta:1.0.0
  • Audio Manipulation Detection: beta:1.0.0

In the guide, we'll be using the following audio files. You can download them all together in the audio_files.zip archive.

filenamedeepfakereplay attackaudio manipulation
Bridget.wavnono2 anomalies
Graham.wavyesno1 anomaly
Hans.wavnoyesno anomaly

At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding of how to integrate Authenticity Verification into your own projects.

Prerequisites

In the guide, we assume that the Virtual Appliance is running on port 8000 of http://localhost and contains proper models and license for the technology. For more information on how to install and start the Virtual Appliance, please refer to the Virtual Appliance Installation chapter.

Environment Setup

We are using Python 3.9 and Python library requests 2.27 in this example. You can install the requests library with pip as follows:

pip install requests~=2.27

Default scenario

To run Authenticity Verification for a single media file, you should start by sending a POST request to the /api/technology/experimental/authenticity-verification endpoint. By default, all available operations are performed.

In Python, you can do this as follows:

import requests

SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/experimental/authenticity-verification"

audio_path = "Bridget.wav"

with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)

print(response.status_code) # Should print '202'

If the task has been successfully accepted, the 202 code will be returned together with a unique task ID in the response body. The task isn't processed immediately, but only scheduled for processing. You can check the current task status by polling for the result.

The URL for polling the result is returned in the Location header. Alternatively, you can assemble the polling URL on your own by appending a slash (/) and the task ID to the endpoint URL.

import json
import requests
import time

# Use the `response` from the previous step
polling_url = response.headers["Location"]
# Alternatively:
# polling_url = ENDPOINT_URL + "/" + response.json()["task"]["task_id"]

while True:
response = requests.get(polling_url)
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(5)

print(json.dumps(data, indent=2))

Once the polling finishes, data will contain the latest response from the server – either the result of Authenticity Verification, or an error message with details, in case processing was not able to finish properly. The result for our sample audio should look as follows:

{
"task": {
"task_id": "330f9d36-04e2-4b78-b4da-79bdd61aa7db",
"state": "done"
},
"result": {
"deepfake_detection": {
"channels": [
{
"channel_number": 0,
"score": -2.00360369682312
}
]
},
"replay_attack_detection": {
"channels": [
{
"channel_number": 0,
"score": -2.017822027206421
}
]
},
"audio_manipulation_detection": {
"channels": [
{
"channel_number": 0,
"segments": [
{
"score": 2.4159951210021973,
"start_time": 0,
"end_time": 0.9675
},
{
"score": 2.4164342880249023,
"start_time": 5.16,
"end_time": 6.129875
}
]
}
]
}
}
}

The result contains data for all three operations. All operations return a log-likelihood ratio score, a real number ranging from -infinity to +infinity. The decision threshold for all three of them is 0. Suspicious files should have a score higher than 0, while genuine files should have a score lower than 0. The replay_attack_detection and deepfake_detection operations have a single score per channel within the channels list. The audio_manipulation_detection has a score per segment, and by default, it only returns suspicious segments.

Each operation has a typical score range established with an evaluation database. While rare, scores may occasionally fall outside of the typical range.

OperationTypical lower rangeTypical upper range
Deepfake Detection-2.06.0
Replay Attack Detection-3.01.0
Audio Manipulation Detection-2.03.0

Advanced usage

The following scenarios can be used to fine-tune the Authenticity Verification process.

Specify required operations

The REST API allows you to specify which operations you want to run. It is possible to specify the list of requested operations (deepfake_detection, audio_manipulation_detection, replay_attack_detection) by passing the requested_operations query parameter in the POST request. As mentioned earlier, the default scenario always includes all available operations.

In the requests library, you can pass the requested_operations to the params argument as a list of strings. For example, if you want to run only Deepfake Detection and Audio Manipulation Detection, you can use the following snippet to send the request:

response = requests.post(
url=ENDPOINT_URL,
files=files,
params={"requested_operations": ["deepfake_detection", "audio_manipulation_detection"]},
)

Although the requests library allows you to pass the requested_operations parameter as a list, the HTTP request looks different. Multiple values for a single query parameter are passed by repeating the parameter key, e.g., ?id=1&id=2&id=3. See the requested_operations query parameters in the relative URL for the POST request above:

/api/technology/experimental/authenticity-verification?requested_operations=deepfake_detection&requested_operations=audio_manipulation_detection
Requested operations and billing

Note that the requested_operations parameter has no effect on the billing. Regardless of which operations you specify, the Authenticity Verification is always billed as one task. The only motivation for specifying the list of operations is to limit the computational resources used for processing the file.

Get raw segments for Audio Manipulation Detection

By default, Audio Manipulation Detection applies additional logic to the output segments to produce a more user-friendly result. If you want to build your own detection logic, you can enable raw segmentation. The result will then include all segments, even if they are considered genuine, and the segments will be more granular (typically a few hundred milliseconds).

import json

config_dict = {
"audio_manipulation_detection": {
"raw_segmentation": True
},
}
payload = {"config": json.dumps(config_dict)}

response = requests.post(
url=ENDPOINT_URL,
files=file,
data=payload,
)

Full Python code

Here is the full example of how to run the Authenticity Verification technology in the default configuration. The code is slightly adjusted and wrapped into functions for better readability.

import json
import requests
import time

SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/experimental/authenticity-verification"


def poll_result(polling_url: str, sleep: int = 5):
while True:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(sleep)
return response


def run_authenticity_verification(audio_path: str):
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
response.raise_for_status()
polling_url = response.headers["Location"]
response_result = poll_result(polling_url)
return response_result.json()

filenames = ["Bridget.wav", "Graham.wav", "Hans.wav"]

for filename in filenames:
print(f"Running Authenticity Verification for file {filename}.")
data = run_authenticity_verification(filename)
result = data["result"]
print(json.dumps(result, indent=2))