Version: 4.0.2

Audio Manipulation Detection

Phonexia audio-manipulation-detection is a tool for detecting possible cut-and-merge manipulation of audio files, by using a pre-trained neural network model. To learn more, visit the technology's home page.

Installation

Docker image
Docker compose
Helm chart

Getting the image

You can easily obtain the audio manipulation detection image from docker hub. There are 2 variants of the image. For CPU and for GPU.

You can get the CPU image by specifying a direct version in the tag (e.g. 1.0.0) or latest for the latest image:

docker pull phonexia/audio-manipulation-detection:latest

The GPU images have a -gpu suffix in the image tag (e.g. 1.0.0-gpu). Alternatively, you can use a tag gpu to get the latest version. In these images, the most computationally demanding tasks are handled by the GPU. The prerequisites are NVIDIA GPU with drivers and nvidia-container-toolkit installed (see the Installing the NVIDIA Container Toolkit for more info).

docker pull phonexia/audio-manipulation-detection:gpu

Running the image

You can start the microservice and list all the supported options by running:

docker run --rm -it phonexia/audio-manipulation-detection:latest --help

The output should look like this:

Usage: audio-manipulation-detection [OPTIONS]

  You can use environment variables in format PHX_<OPTION_NAME> instead of
  command line arguments.

Options:
  -m, --model PATH                Path to a model file.  [required]
  -k, --license_key TEXT          License key.  [required]
  -l, --log_level [fatal|error|warning|info|debug]
                                  Logging level.
  --log_format [human|json]       Logging format.
  -a, --listening_address TEXT    Address where the server will listen. The
                                  address '[::]' also accepts IPv4
                                  connections.
  -p, --port INTEGER RANGE        Port on which the server will be listening.
                                  [1<=x<=65535]
  --device [cpu|cuda]             Compute device used for inference.
  --num_threads_per_instance INTEGER RANGE
                                  Number of threads per instance (applies to
                                  CPU processing only). Use N CPU threads in
                                  the microservice for each request. Number of
                                  threads is automatically detected if set to
                                  0.  [x>=0]
  --num_instances_per_device INTEGER RANGE
                                  Number of instances per device (both CPU and
                                  GPU processing). Microservice can process
                                  requests concurrently if value is >1.
                                  [x>=1]
  --device_index INTEGER RANGE    Device identifier.  [x>=0]
  --help                          Show this message and exit.

note

The model and license_key options are required. To obtain the model and license, contact Phonexia.

You can specify the options either via command line arguments or via environmental variables.

Run the container with the mandatory parameters:

docker run --rm -it -p 8080:8080 -v /opt/phx/models:/models phonexia/audio-manipulation-detection:latest --model /models/audio_manipulation_detection-beta-1.0.0.model --license_key ${license-key}

To run GPU images you will need to make the GPU available inside the docker container. This is done by --gpus parameter (typically --gpus all), see the Access an NVIDIA GPU chapter for more info, for example:

Run the container with the mandatory parameters:

docker run --rm -it --gpus all -v /opt/phx/models:/models -p 8080:8080 phonexia/audio-manipulation-detection:gpu --model /models/audio_manipulation_detection-beta-1.0.0.model --license_key ${license-key}

Replace the /opt/phx/models, audio_manipulation_detection-beta-1.0.0.model and license-key with the corresponding values.

With this command, the container will start, and the microservice will be listening on port 8080 on localhost.

Docker compose

There are 2 variants of the docker image. For CPU and for GPU. Create a docker-compose.yml file for the specific variant:

version: '3'
services:
  audio-manipulation-detection:
    image: phonexia/audio-manipulation-detection:latest
    environment:
      - PHX_MODEL_PATH=/models/audio_manipulation_detection-beta-1.0.0.model
      - PHX_LICENSE_KEY=<license-key>
    ports:
      - 8080:8080
    volumes:
      - ./models:/models/

version: '3'
services:
  audio-manipulation-detection:
    image: phonexia/audio-manipulation-detection:gpu
    environment:
      - PHX_MODEL_PATH=/models/audio_manipulation_detection-beta-1.0.0.model
      - PHX_LICENSE_KEY=<license-key>
    ports:
      - 8080:8080
    volumes:
      - ./models:/models/
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Create a models folder in the same directory as the docker-compose.yml file and place a model file in it. Replace <license-key> with your license key and audio_manipulation_detection-beta-1.0.0.model with the actual name of a model.

note

The model and license_key options are required. To obtain the model and license, contact Phonexia.

You can than start the microservice by running:

$ docker compose up

Performance optimization

The audio-manipulation-detection microservice supports GPU acceleration.

In the docker images with GPU support, the GPU acceleration is enabled by default. While GPU acceleration will be used primarily, certain processing tasks will still rely on CPU resources.

For better performance, multiple microservices can share a GPU unit. The number of microservice instances per GPU depends on the hardware used.

Microservice communication

gRPC API

For communication, our microservices use gRPC, which is a high-performance, open-source Remote Procedure Call (RPC) framework that enables efficient communication between distributed systems using a variety of programming languages. We use an interface definition language to specify a common interface and contracts between components. This is primarily achieved by specifying methods with parameters and return types.

Take a look at our gRPC API documentation. The audio-manipulation-detection microservice defines a AudioManipulationDetection service with remote procedure called Detect. This procedure accepts an argument (also referred to as "message") called DetectRequest, which contains the audio as an array of bytes, together with an optional config argument.

This DetectRequest argument is streamed, meaning that it may be received in multiple requests, each containing a part of the audio. If specified, the optional config argument must be sent only with the first request. Once all the requests have been received and processed, the Detect procedure returns a message called DetectResponse which consists of the processed audio length and array of detected segments in the audio. The segments than consist of the detection score, start time and end time of the segment.

Connecting to microservice

There are multiple ways how you can communicate with our microservices.

Generated library
Python client
grpcurl client
GUI clients

Using generated library

The most common way how to communicate with the microservices is via a programming language using a generated library.

Python library

If you use Python as your programming language, you can use our official gRPC Python library.

To install the package using pip, run:

pip install phonexia-grpc

You can then import:

Specific libraries for each microservice that provide the message wrappers.
stubs for the gRPC clients.

from phonexia.grpc.common.core_pb2 import Audio, RawAudioConfig, TimeRange
from phonexia.grpc.technologies.audio_manipulation_detection.experimental.audio_manipulation_detection_pb2 import (
    DetectConfig,
    DetectRequest,
    DetectResponse,
)
from phonexia.grpc.technologies.audio_manipulation_detection.experimental.audio_manipulation_detection_pb2_grpc import (
    AudioManipulationDetectionStub,
)

Generate library for programming language of your choice

For the definition of microservice interfaces, we use the standard way of protocol buffers. The services, together with the procedures and messages that they expose, are defined in the so-called proto files.

The proto files can be used to generate client libraries in many programming languages. Take a look at protobuf tutorials for how to get started with generating the library in the languages of your choice using the protoc tool.

You can find the proto files developed by Phonexia in this repository.

Using existing clients

Phonexia Python client

The easiest way to get started with testing is to use our simple Python client. To get it, run:

pip install phonexia-audio-manipulation-detection-client

After the successful installation, run the following command to see the client options:

audio_manipulation_detection_client --help

grpcurl client

If you need a simple tool for testing the microservice on the command line, you can use grpcurl. This tool can serialize and send a request for you, if you provide the request body in JSON format and specify the endpoint.

The audio content in the body must be encoded in Base64. The request also cannot exceed 4 MiB, therefore it's necessary to split bigger files to multiple chunks. You can use jq tool to generate JSON input for grpcurl.

Now you can make the request. The microservice supports reflection. That means that you don't need to know the API in advance to make a request. Replace ${path_to_audio_file} with corresponding value.

base64 -w 4000000 ${path_to_audio_file} | jq -cnR '{"audio":{"content":inputs}}' | grpcurl -plaintext -use-reflection -d @ localhost:8080 phonexia.grpc.technologies.audio_manipulation_detection.experimental.AudioManipulationDetection/Detect

The grpcurl automatically serializes the response to this request into JSON including the detected segments.

_{^{Further links

Maintained by Phonexia
Contact us via e-mail, or open a ticket at the Phonexia Service Desk
File an issue
See list of licenses
See the terms of use
Versioning
We use Semantic Versioning.}}

Installation​

Getting the image​

Running the image​

Docker compose​

Performance optimization​

Microservice communication​

gRPC API​

Connecting to microservice​

Using generated library​

Python library​

Generate library for programming language of your choice​

Using existing clients​

Phonexia Python client​

grpcurl client​

GUI clients​

Further links​

Versioning​