Deepfake Detection
Phonexia has developed a Deepfake Detection technology designed to identify artificial voices within audio recordings, thereby enhancing the security and reliability of speaker verification systems. This approach leverages the transformed-based architecture and is primarily trained on datasets, which encompasses a wide range of synthesized, converted, and replayed speech examples.
The model is trained in a self-supervised manner, and its performance is improved through carefully designed data augmentation techniques. It has been trained on a large corpus of various data source, including telephone data, resulting in fewer false alarms on such recordings. The model requires a minimum of 3 seconds of speech for inference however, the best performance is achieved on lengths of 5 seconds and above.
Possible use cases
- Banks and Call Centers: Enhances the security of customer interactions by ensuring that communications are with legitimate individuals, thereby preventing fraudulent activities and unauthorized access.
- Forensic Analysis: Assists law enforcement agencies in authenticating audio evidence, ensuring its credibility in investigations and legal proceedings.
Scoring
The score is the LLR (log-likelihood ratio), which represents how much more likely the input is to belong to the deepfake class versus the genuine class.
- A positive score (> 0) indicates the audio is more likely a deepfake.
- A negative score (< 0) suggests the audio is more likely genuine.
The system is calibrated so that a score of 0 corresponds to the point of equal likelihood between the two classes on our evaluation datasets. This means the model is maximally uncertain at this point—it considers both outcomes equally probable.
The optimal decision threshold may differ from 0 depending on your application. To achieve the desired trade-off between false positives and false negatives, you may need to adjust the threshold based on your specific dataset and requirements.
Output Range
The score is returned as an unbounded LLR, theoretically ranging from minus infinity to plus infinity. However, in practice, values typically fall within the range of -2 to +6.
FAQ
How can I improve processing speed?
To speed up processing, ensure that Authenticity Verification is running on a GPU.