Skip to main content
Version: 3.4.0

Language identification

Phonexia language-identification is a tool for calculating the probabilities of individual languages appearing in an audio recording. To learn more about the technology, visit this link.

Versioning

We use SemVer for versioning.

Quick reference

How to use this image

Getting the image

You can easily obtain the docker image from docker hub. There are 2 variants of the image. One for CPU, one for GPU with tag ending with gpu.

To get the latest CPU image, run:

docker pull phonexia/language-identification:latest

To get the latest GPU image, run:

docker pull phonexia/language-identification:gpu

Running the image

info

The preferred way to deploy microservice to a production environment is to use Helm Chart. See the Helm chart deployment for more information.

Docker

You can start the microservice and list all the supported options by running:

docker run --rm -it phonexia/language-identification:latest --help

The output should look like this:

Usage: language-identification [OPTIONS]

Options:
-h,--help Print this help message and exit
-m,--model file REQUIRED (Env:PHX_MODEL_PATH)
Path to a model file.
-k,--license_key string REQUIRED (Env:PHX_LICENSE_KEY)
License key.
-a,--listening_address address [[::]] (Env:PHX_LISTENING_ADDRESS)
Address on which the server will be listening. Address '[::]' also accepts IPv4 connections.
-p,--port number [8080] (Env:PHX_PORT)
Port on which the server will be listening.
-l,--log_level level:{error,warning,info,debug,trace} [info] (Env:PHX_LOG_LEVEL)
Logging level. Possible values: error, warning, info, debug, trace.
--keepalive_time_s number:[0, max_int] [60] (Env:PHX_KEEPALIVE_TIME_S)
Time between 2 consecutive keep-alive messages, that are sent if there is no activity from the client. If set to 0, the default gRPC configuration (2hr) will be set (note, that this may get the microservice into unresponsive state).
--keepalive_timeout_s number:[1, max int] [20] (Env:PHX_KEEPALIVE_TIMEOUT_S)
Time to wait for keep alive acknowledgement until the connection is dropped by the server.
--device TEXT:{cpu,cuda} [cpu] (Env:PHX_DEVICE)
Compute device used for inference.

Note that the model and license_key options are required. To obtain the model and license, contact Phonexia.

You can specify the options either via command line arguments or via environmental variables.

Run the container with the mandatory parameters:

docker run --rm -it -v /opt/phx/models:/models -p 8080:8080 phonexia/language-identification:latest --model /models/language_identification-xl-5.2.2.model --license_key ${license-key}

Replace the /opt/phx/models, language_identification-xl-5.2.2.model and license-key with the corresponding values.

With this command, the container will start, and the microservice will be listening on port 8080 on localhost.

Docker compose

Create a docker-compose.yml file:

version: '3'
services:
language-identification:
image: phonexia/language-identification:latest
environment:
- PHX_MODEL_PATH=/models/language_identification-xl-5.2.2.model
- PHX_LICENSE_KEY=<license-key>
ports:
- 8080:8080
volumes:
- ./models:/models/

Create a models folder in the same directory as the docker-compose.yml file and place a model file in it. Replace <license-key> with your license key and language_identification-xl-5.2.2.model with the actual name of a model.

Run a microservice:

$ docker compose up

GPU

The GPU images has suffix -gpu in the image tag (e.g. 1.2.0-gpu) or you can use a tag gpu to get the latest version. In this images the most computationally demanding tasks are handled by the GPU. The prerequisites are NVIDIA GPU with drivers and nvidia-container-toolkit installed (see the Installing the NVIDIA Container Toolkit for more info).

To run GPU images you will need to make a GPU available in a docker container. It is done by parameter --gpus all (see the Access an NVIDIA GPU chapter for more info) , for example:

docker run --rm -it -v /opt/phx/models:/models --gpus all phonexia/language-identification:gpu --model /models/language_identification-xl-5.2.2.model --license_key ${license-key}

Or use a docker compose file:

version: '3'
services:
language-identification:
image: phonexia/language-identification:gpu
environment:
- PHX_MODEL_PATH=/models/language_identification-xl-5.2.2.model
- PHX_LICENSE_KEY=<license-key>
ports:
- 8080:8080
volumes:
- ./models:/models/
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

Microservice communication

gRPC API

For communication, our microservices use gRPC, which is a high-performance, open-source Remote Procedure Call (RPC) framework that enables efficient communication between distributed systems using a variety of programming languages. We use an interface definition language to specify a common interface and contracts between components. This is primarily achieved by specifying methods with parameters and return types.

Take a look at our gRPC API documentation. The language-identification microservice defines a LanguageIdentification service with remote procedures called Identify and ListSupportedLanguages. The Identify procedure accepts an argument (also referred to as "message") called IdentifyRequest, which contains the audio as an array of bytes, together with an optional config argument.

The config argument is used for controlling the identification parameters. In this argument, we can specify the amount of speech that will be used for the identification as well as all the languages from the set of supported languages that will be included in the identification. This way we can select only a subset of all the available languages. Finally, we can define a language groups, that group several languages and assigns them common identifier.

This IdentifyRequest argument is streamed, meaning that it may be received in multiple requests, each containing a part of the audio. If specified, the optional config argument must be sent only with the first request. Once all requests have been received and processed, the Identify procedure returns a message called IdentifyResponse which consists of the resulting language probability scores.

Connecting to microservice

There are multiple ways how you can communicate with our microservices.

Using generated library

The most common way how to communicate with the microservices is via a programming language using a generated library.

Python library

If you use Python as your programming language, you can use our gRPC Python library.

To get this library, simply run:

pip install phonexia-grpc

You can then import:

  • specific libraries for each microservice that provide the message wrappers
  • stubs for the gRPC clients.
# phx_core contains classes common for multiple microservices like `Audio`.
import phonexia.grpc.common.core_pb2 as phx_core
# language_identification_pb2 contains `IdentifyRequest`, `IdentifyResponse` or 'IdentifyConfig'.
import phonexia.grpc.technologies.language_identification.v1.language_identification_pb2 as lid
# language_identification_pb2_grpc contains `LanguageIdentificationStub` needed to make the requests.
import phonexia.grpc.technologies.language_identification.v1.language_identification_pb2_grpc as lid_grpc
Generate library for programming language of your choice

For the definition of microservice interfaces, we use the standard way of protocol buffers. The services, together with the procedures and messages that they expose, are defined in the so-called proto files.

The .proto files can be used to generate client libraries in many programming languages. Take a look at protobuf tutorials for how to get started with generating the library in the languages of your choice using the protoc tool.

You can find the proto files developed by Phonexia in this repository.

Using existing clients

Phonexia python client

The easiest way to get started with testing is to use our simple Python client. To get it, run:

pip install phonexia-language-identification-client

After the successful installation, run the following command to see the client options:

language_identification_client --help
grpcurl client

If you need a simple tool for testing the microservice on the command line, you can use grpcurl. This tool can serialize and send a request for you, if you provide the request body in JSON format and specify the endpoint.

You need to make sure that the audio content in the body is encoded in Base64. Unfortunately you need to do this manually as grpcurl can't do this for you. The request cannot exceed 4 MiB therefore we will split the file to chunks and use jq tool to generate JSON input for grpcurl.

Now you can make the request. The microservice supports reflection. That means that you don't need to know the API in advance to make a request. Replace ${path_to_audio_file} with corresponding value.

base64 -w 4000000 ${path_to_audio_file} | jq -cnR '{"audio":{"content":inputs}}' | grpcurl -plaintext -use-reflection -d @ localhost:8080 phonexia.grpc.technologies.language_identification.v1.LanguageIdentification/Identify

The grpcurl automatically serializes the response to this request into JSON including the transcription segments and the detected language.

GUI clients

If you'd prefer to use a GUI client like Postman or Warthog to test the microservice, take a look at the GUI Client page in our documentation. Note that you will still need to convert the audio into the Base64 format manually as those tools do not support it by default either.