Referential Deepfake Detection
Please note that Referential Deepfake Detection is an experimental feature.
It is under development and may change in the future.
This guide demonstrates how to perform Referential Deepfake Detection with Phonexia Speech Platform 4.
The Referential Deepfake Detection technology enables you to determine whether a voice in audio data is genuine with respect to an authentic reference media file or whether it is likely to be a deepfake. We encourage you to read the high-level documentation for Deepfake Detection to learn more about its features and capabilities.
Note that all example results were acquired with the specific model version and may change in future releases.
- Referential Deepfake Detection:
xl5 1.0.0
In the guide, we'll be using the following media files. You can download them all together in the audio_files.zip archive.
| reference filename | questioned filename | LLR score |
|---|---|---|
| Harry.wav | Harry2.wav | -0.7862 |
| Harry.wav | Harry_clone.mp3 | 0.5050 |
| Juan.wav | Juan2.wav | -2.2655 |
| Juan.wav | Juan_clone.mp3 | 0.6505 |
| Veronika.wav | Veronika2.wav | -2.3143 |
| Veronika.wav | Veronika_clone.mp3 | -0.0853 |
At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding of how to integrate Referential Deepfake Detection into your own projects.
Prerequisites
Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.
Run Referential Deepfake Detection
To run Referential Deepfake Detection you need a reference media file that
contains the genuine voice of a person, and a questioned media file which you
want to test for being a deepfake of the reference voice. file_reference and
file_questioned are mandatory parameters. Note that only one channel per file
can be used, so make sure to select the channel in case of a multi-channel file.
You should start by sending a POST request to the
/api/technology/experimental/referential-deepfake-detection
endpoint.
In Python, you can do this as follows:
import requests
VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/experimental/referential-deepfake-detection"
media_file_reference = "Harry.wav"
media_file_questioned = "Harry_clone.mp3"
with (
open(media_file_reference, mode="rb") as file_reference,
open(media_file_questioned, mode="rb") as file_questioned,
):
files = {
"file_reference": file_reference,
"file_questioned": file_questioned,
}
start_task_response = requests.post(
url=MEDIA_FILE_BASED_ENDPOINT_URL,
files=files,
)
print(start_task_response.status_code) # Should print '202'
If the task has been successfully accepted, the 202 code will be returned
together with a unique task ID in the response body. The task isn't processed
immediately, but only scheduled for processing. You can check the current task
status by polling for the result.
Polling
To obtain the final result, periodically query the task status until the task
state changes to done, failed or rejected. The general polling procedure
is described in detail in the
Task lifecycle code examples.
Result of Referential Deepfake Detection
The result field of the task contains the score of Referential deepfake
Detection.
The task result for our sample file should look as follows:
{
"task": {
"task_id": "330f9d36-04e2-4b78-b4da-79bdd61aa7db",
"state": "done"
},
"result": {
"score": 0.505038857460022
}
}
The result contains a score which represents a log-likelihood ratio (LLR), a
real number ranging from -infinity to +infinity. The decision threshold
is 0. Suspicious files should have a score greater than 0, while genuine files
should have a score less than 0.
The technology has a typical score range established with an evaluation dataset. While rare, scores may occasionally fall outside the typical range. Typical score ranges may change over time in future model versions. Consult the technology documentation for details.
The optimal decision threshold may differ from 0 depending on your application. To achieve the desired trade-off between false positives and false negatives, you may need to adjust the threshold based on your specific dataset and requirements.
Full Python code
Here is the full example of how to run the Referential Deepfake Detection technology in the default configuration. The code is slightly adjusted and wrapped into functions for better readability. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.
import json
import requests
import time
VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/experimental/referential-deepfake-detection"
def poll_result(polling_url, polling_interval=5):
"""Poll the task endpoint until processing completes."""
while True:
polling_task_response = requests.get(polling_url)
polling_task_response.raise_for_status()
polling_task_response_json = polling_task_response.json()
task_state = polling_task_response_json["task"]["state"]
if task_state in {"done", "failed", "rejected"}:
break
time.sleep(polling_interval)
return polling_task_response
def run_media_based_task(media_file_reference, media_file_questioned, params = {}, config = {}):
"""Create a media-based task and wait for results."""
with (
open(media_file_reference, mode="rb") as file_reference,
open(media_file_questioned, mode="rb") as file_questioned,
):
files = {
"file_reference": file_reference,
"file_questioned": file_questioned,
}
start_task_response = requests.post(
url=MEDIA_FILE_BASED_ENDPOINT_URL,
files=files,
params=params,
data={"config": json.dumps(config)},
)
start_task_response.raise_for_status()
polling_url = start_task_response.headers["Location"]
task_result = poll_result(polling_url)
return task_result.json()
# Run Referential Deepfake Detection
# We use the questioned files as keys in the media_files dictionary, because
# in contrast to the reference files they are unique.
media_files = {
"Harry2.wav": "Harry.wav",
"Harry_clone.mp3": "Harry.wav",
"Juan2.wav": "Juan.wav",
"Juan_clone.mp3": "Juan.wav",
"Veronika2.wav": "Veronika.wav",
"Veronika_clone.mp3": "Veronika.wav",
}
for media_file_questioned, media_file_reference in media_files.items():
print(
f"Running Referential Deepfake Detection for reference file {media_file_reference} "
f"and questioned file {media_file_questioned}"
)
media_file_based_task = run_media_based_task(media_file_reference, media_file_questioned)
media_file_based_task_result = media_file_based_task["result"]
print(json.dumps(media_file_based_task_result, indent=2))