Skip to main content
Version: 3.7.0

Components

Components

This is the list of basic description of the components that the virtual appliance is composed of. More information about some of them can be found below this table.

ComponentOverview
Operating systemThere is Rocky Linux 9.5 under the hood.
GPU supportVirtual appliance has all necessary prerequisites pre-baked to allow running GPU-powered workloads (especially enhanced-speech-to-text-built-on-whisper). NVIDIA drivers and container toolkit are already installed. GPU time-based sharing is enabled by default, allowing multiple technologies to run on a single GPU simultaneously. GPU-powered images of all technologies are included.
KubernetesThere is k3s Kubernetes distribution deployed inside.
Container registryRegistry is used for storing all necessary images. No need to pull images from the internet.
Ingress controllerWe use ingress-nginx ingress controller. This component serves as a reverse proxy and load balancer.
Speech platformThis is the application for solving various voice-related problems like speaker identification, speech-to-text transcription, and many more. Speech platform is accessible via web browser or API.
Admin consoleAdmin console is a simple web page containing links to various admin-related tools. Console is located at http://<IP_of_virtual_appliance>/admin. It contains links to filebrowser, prometheus, grafana.
File BrowserFile Browser is a web-based file browser/editor used to work with data on a data disk. It is accessible at <IP_address_of_VA>/filebrowser.
PrometheusPrometheus is a tool for providing monitoring information about Kubernetes components. It is accessible at <IP_address_of_VA>/prometheus.
GrafanaGrafana is a tool for visualization of Prometheus metrics. It is accessible at <IP_address_of_VA>/grafana.

Each component is pre-configured and ready to use out of the box.

Further Details on Selected Components

Grafana

Grafana is tool for visualizing application and kubernetes metrics. List of most useful dashboards available in the grafana:

  • Envoy Clusters - See envoy cluster statistics
  • Kubernetes / Compute Resources / Pod - See resource consumption of individual pods
  • NGINX Ingress controller - See ingress controller stats
  • NVIDIA DCGM Exporter Dashboard - See GPU device stats
  • Node Exporter / Nodes - See stats about virtual appliance
  • Speech Platform API capacity - See metrics about speech platform itself

Speech Platform

List of the components of Speech Platform:

  • frontend - simple webserver serving static html, css, javascript and image files
  • docs - simple webserver serving documentation
  • assets - simple webserver hosting examples
  • api - python component providing REST API interface
  • envoy - router and loadbalancer for GRPC messages
  • media-conversion - python component used for converting audio files from various formats to simple wav format splitting multi-channel audio into multiple single-channel files
  • technology microservices ** enhanced-speech-to-text-built-on-whisper - transcribes speech to text ** speech-to-text-phonexia - transcribes speech to text ** voiceprint-extraction - extracts voiceprint from audio file ** voiceprint-comparison - compares multiple voiceprints ** language-identification - identify language in audio

Request flow

Graphical representation

Step-by-Step representation

  1. User POST request (for example transcribe speech to text) to API.
  2. API creates task for processing and output task id to the user.
  3. From this point user can poll on the task to get the result.
  4. API calls media-conversion via envoy.
  5. Media conversion converts the audiofile to wav format and possibly splits it into multiple mono-channel files.
  6. API gets converted audiofile from media-conversion.
  7. API calls enhanced-speech-to-text-built-on-whisper via envoy.
  8. Enhanced-speech-to-text-built-on-whisper transcribes the audiofile.
  9. API gets the transcription.
  10. User can retrieve the task result.