Authenticity Verification
Please note that Authenticity Verification is an experimental feature.
It is under development and may change in the future.
This guide demonstrates how to perform Authenticity Verification with Phonexia Speech Platform 4.
Authenticity Verification technology enables you to determine if audio data is genuine. The technology is designed to offer various operations for verifying audio authenticity.
We encourage you to read the documentation for each operation to learn more about their features and capabilities.
Note that all example results were acquired by the specific model versions and may change in future releases.
- Deepfake Detection:
beta:2.2.0
- Replay Attack Detection:
beta:1.0.0
- Audio Manipulation Detection:
beta:1.0.0
In the guide, we'll be using the following audio files. You can download them all together in the audio_files.zip archive.
filename | deepfake | replay attack | audio manipulation |
---|---|---|---|
Bridget.wav | no | no | 2 anomalies |
Graham.wav | yes | no | 1 anomaly |
Hans.wav | no | yes | no anomaly |
At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding of how to integrate Authenticity Verification into your own projects.
Prerequisites
In the guide, we assume that the Virtual Appliance is running on port 8000
of
http://localhost
and contains proper models and license for the technology.
For more information on how to install and start the Virtual Appliance, please
refer to the Virtual Appliance Installation chapter.
Environment Setup
We are using Python 3.9
and Python library requests 2.27
in this example.
You can install the requests
library with pip
as follows:
pip install requests~=2.27
Default scenario
To run Authenticity Verification for a single media file, you should start by
sending a POST
request to the
/api/technology/experimental/authenticity-verification
endpoint. By default, all available operations are performed.
In Python, you can do this as follows:
import requests
SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/experimental/authenticity-verification"
audio_path = "Bridget.wav"
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
print(response.status_code) # Should print '202'
If the task has been successfully accepted, the 202
code will be returned
together with a unique task ID
in the response body. The task isn't processed
immediately, but only scheduled for processing. You can check the current task
status by polling for the result.
The URL for polling the result is returned in the Location
header.
Alternatively, you can assemble the polling URL on your own by appending a slash
(/
) and the task ID
to the endpoint URL.
import json
import requests
import time
# Use the `response` from the previous step
polling_url = response.headers["Location"]
# Alternatively:
# polling_url = ENDPOINT_URL + "/" + response.json()["task"]["task_id"]
while True:
response = requests.get(polling_url)
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(5)
print(json.dumps(data, indent=2))
Once the polling finishes, data
will contain the latest response from the
server – either the result of Authenticity Verification, or an error message
with details, in case processing was not able to finish properly. The result for
our sample audio should look as follows:
{
"task": {
"task_id": "330f9d36-04e2-4b78-b4da-79bdd61aa7db",
"state": "done"
},
"result": {
"deepfake_detection": {
"channels": [
{
"channel_number": 0,
"score": -2.00360369682312
}
]
},
"replay_attack_detection": {
"channels": [
{
"channel_number": 0,
"score": -2.017822027206421
}
]
},
"audio_manipulation_detection": {
"channels": [
{
"channel_number": 0,
"segments": [
{
"score": 2.4159951210021973,
"start_time": 0,
"end_time": 0.9675
},
{
"score": 2.4164342880249023,
"start_time": 5.16,
"end_time": 6.129875
}
]
}
]
}
}
}
The result contains data for all three operations. All operations return a
log-likelihood ratio score, a real number ranging from -infinity
to
+infinity
. The decision threshold for all three of them is 0. Suspicious files
should have a score higher than 0, while genuine files should have a score lower
than 0. The replay_attack_detection
and deepfake_detection
operations have a
single score per channel within the channels
list. The
audio_manipulation_detection
has a score per segment, and by default, it only
returns suspicious segments.
Each operation has a typical score range established with an evaluation database. While rare, scores may occasionally fall outside of the typical range.
Operation | Typical lower range | Typical upper range |
---|---|---|
Deepfake Detection | -2.0 | 6.0 |
Replay Attack Detection | -3.0 | 1.0 |
Audio Manipulation Detection | -2.0 | 3.0 |
Advanced usage
The following scenarios can be used to fine-tune the Authenticity Verification process.
Specify required operations
The REST API allows you to specify which operations you want to run. It is
possible to specify the list of requested operations (deepfake_detection
,
audio_manipulation_detection
, replay_attack_detection
) by passing the
requested_operations
query parameter in the POST
request. As mentioned
earlier, the default scenario always includes all available operations.
In the requests
library, you can pass the requested_operations
to the
params
argument as a list of strings. For example, if you want to run only
Deepfake Detection and Audio Manipulation Detection, you can use the following
snippet to send the request:
response = requests.post(
url=ENDPOINT_URL,
files=files,
params={"requested_operations": ["deepfake_detection", "audio_manipulation_detection"]},
)
Although the requests
library allows you to pass the requested_operations
parameter as a list, the HTTP request looks different. Multiple values for a
single query parameter are passed by repeating the parameter key, e.g.,
?id=1&id=2&id=3
. See the requested_operations
query parameters in the
relative URL for the POST
request above:
/api/technology/experimental/authenticity-verification?requested_operations=deepfake_detection&requested_operations=audio_manipulation_detection
Note that the requested_operations
parameter has no effect on the billing.
Regardless of which operations you specify, the Authenticity Verification is
always billed as one task. The only motivation for specifying the list of
operations is to limit the computational resources used for processing the file.
Get raw segments for Audio Manipulation Detection
By default, Audio Manipulation Detection applies additional logic to the output segments to produce a more user-friendly result. If you want to build your own detection logic, you can enable raw segmentation. The result will then include all segments, even if they are considered genuine, and the segments will be more granular (typically a few hundred milliseconds).
import json
config_dict = {
"audio_manipulation_detection": {
"raw_segmentation": True
},
}
payload = {"config": json.dumps(config_dict)}
response = requests.post(
url=ENDPOINT_URL,
files=file,
data=payload,
)
Full Python code
Here is the full example of how to run the Authenticity Verification technology in the default configuration. The code is slightly adjusted and wrapped into functions for better readability.
import json
import requests
import time
SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/experimental/authenticity-verification"
def poll_result(polling_url: str, sleep: int = 5):
while True:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(sleep)
return response
def run_authenticity_verification(audio_path: str):
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
response.raise_for_status()
polling_url = response.headers["Location"]
response_result = poll_result(polling_url)
return response_result.json()
filenames = ["Bridget.wav", "Graham.wav", "Hans.wav"]
for filename in filenames:
print(f"Running Authenticity Verification for file {filename}.")
data = run_authenticity_verification(filename)
result = data["result"]
print(json.dumps(result, indent=2))