Components
Components
This is the list of basic description of the components that the virtual appliance is composed of. More information about some of them can be found below this table.
Component | Overview |
---|---|
Operating system | There is Rocky Linux 9.5 under the hood. |
GPU support | Virtual appliance has all necessary prerequisites pre-baked to allow running GPU-powered workloads (especially enhanced-speech-to-text-built-on-whisper). NVIDIA drivers and container toolkit are already installed. GPU time-based sharing is enabled by default, allowing multiple technologies to run on a single GPU simultaneously. GPU-powered images of all technologies are included. |
Kubernetes | There is k3s Kubernetes distribution deployed inside. |
Container registry | Registry is used for storing all necessary images. No need to pull images from the internet. |
Ingress controller | We use ingress-nginx ingress controller. This component serves as a reverse proxy and load balancer. |
Speech platform | This is the application for solving various voice-related problems like speaker identification, speech-to-text transcription, and many more. Speech platform is accessible via web browser or API. |
Admin console | Admin console is a simple web page containing links to various admin-related tools. Console is located at http://<IP_of_virtual_appliance>/admin . It contains links to filebrowser, prometheus, grafana. |
File Browser | File Browser is a web-based file browser/editor used to work with data on a data disk. It is accessible at <IP_address_of_VA>/filebrowser . |
Prometheus | Prometheus is a tool for providing monitoring information about Kubernetes components. It is accessible at <IP_address_of_VA>/prometheus . |
Grafana | Grafana is a tool for visualization of Prometheus metrics. It is accessible at <IP_address_of_VA>/grafana . |
Each component is pre-configured and ready to use out of the box.
Further Details on Selected Components
Grafana
Grafana is tool for visualizing application and kubernetes metrics. List of most useful dashboards available in the grafana:
- Envoy Clusters - See envoy cluster statistics
- Kubernetes / Compute Resources / Pod - See resource consumption of individual pods
- NGINX Ingress controller - See ingress controller stats
- NVIDIA DCGM Exporter Dashboard - See GPU device stats
- Node Exporter / Nodes - See stats about virtual appliance
- Speech Platform API capacity - See metrics about speech platform itself
Speech Platform
List of the components of Speech Platform:
- frontend - simple webserver serving static html, css, javascript and image files
- docs - simple webserver serving documentation
- assets - simple webserver hosting examples
- api - python component providing REST API interface
- envoy - router and loadbalancer for GRPC messages
- media-conversion - python component used for converting audio files from various formats to simple wav format splitting multi-channel audio into multiple single-channel files
- technology microservices ** enhanced-speech-to-text-built-on-whisper - transcribes speech to text ** speech-to-text-phonexia - transcribes speech to text ** voiceprint-extraction - extracts voiceprint from audio file ** voiceprint-comparison - compares multiple voiceprints ** language-identification - identify language in audio
Request flow
Graphical representation
Step-by-Step representation
- User POST request (for example transcribe speech to text) to API.
- API creates task for processing and output task id to the user.
- From this point user can poll on the task to get the result.
- API calls media-conversion via envoy.
- Media conversion converts the audiofile to wav format and possibly splits it into multiple mono-channel files.
- API gets converted audiofile from media-conversion.
- API calls enhanced-speech-to-text-built-on-whisper via envoy.
- Enhanced-speech-to-text-built-on-whisper transcribes the audiofile.
- API gets the transcription.
- User can retrieve the task result.