Installation & Configuration
The Speech Platform Virtual Appliance is a distribution of the Phonexia Speech Platform in the form of a virtual image. Presently, it exclusively supports the OVF format.
Installation
This section describes how to install virtual appliance into your virtualization platform.
Prerequisites
Currently we support only virtualbox and VMWare.
It will probably work on other virtualization platforms but we haven't tested it yet.
Minimal HW requirements
- 60GB of disk space
- 4 CPU cores
- 32GB of memory
Minimal requirements mean that you are able to process single technology (speaker identification, enhanced speech-to-text built on Whisper or speech-to-text by Phonexia) for evaluation purposes. We recommend to disable all non-needed (not evaluated) technologies to save the resources.
Resource usage per technology
- Speaker identification - 1 CPU core and 2GB memory
- Speech-to-text by phonexia - 1 CPU core and 4GB memory per language
- Enhanced Speech-to-text built on whisper - 8 CPU cores and 8GB memory or 1 CPU core and 8GB memory and GPU card
Note: Running enhanced speech-to-text built on Whisper on CPU is slow. We recommend to use at least 8 CPU cores to run our built-in examples in reasonable time.
GPU
GPU is not required to make virtual appliance work but you will suffer serious performance degradation for enhanced speech-to-text built on Whisper functionality.
If you decide to use GPU, then make sure that
- Server HW (especially BIOS) has support for IOMMU.
- Host OS can pass GPU device to virtualization platform (== Host OS can be configured to NOT use the GPU device)
- Virtualization platform can pass GPU device to guest OS.
Installation guide
- Download virtual appliance
- Import virtual appliance to your virtualization platform (For Hyper-V deployment, please refer to section 'How to modify OVF to Hyper-V compatible VM')
- Run virtual appliance
Post-installation steps
Virtual appliance is configured to obtain IP address from DHCP server. If you are not using DHCP server for IP allocation or prefer to set up static IP, then you have to reconfigure the OS.
SSH server
SSH server is deployed and enabled in virtual appliance. Use following credentials:
login: root
password: InVoiceWeTrust
We recommend to change the root
password and disable password authentication
via SSH for root
user in favor of key-based authentication.
Instead of root
user we recommend to use phonexia
user as we plan to disable
root
user login in future. Use sudo
command to switch to the root
user
after login.
Open ports
List of open ports:
- SSH (22) - for convenient access to OS
- HTTP (80) - Speech platform is accessible via HTTP protocol
- HTTPS (443) - Speech platform is also accessible via HTTPS protocol
- HTTPS (6443) - Kubernetes API
- HTTPS (10250) - Metrics server
K3s check
K3s (kubernetes distribution) is started automatically by systemd when virtual appliance is started. You can verify whether k3s is running or not with this command:
systemctl status k3s
Kubernetes check
When k3s service is started, it takes some time until application (== kubernetes pods) is started. Usually it takes around 2 minutes. To check if application is up and running, execute following command:
kubectl -n speech-platform get pods
When all pods are running, output looks like:
[root@speech-platform ~]# kubectl -n speech-platform get pods
NAME READY STATUS RESTARTS AGE
enhanced-speech-to-text-built-on-whisper-9c97c9ffd-lj8tf 0/1 CreateContainerConfigError 0 111s
language-identification-6c79cdfbfb-6lk52 0/1 CreateContainerConfigError 0 110s
media-conversion-58b5d544f4-9jt6x 1/1 Running 0 111s
speaker-diarization-5548bbd6d8-kfvwh 0/1 CreateContainerConfigError 0 110s
speech-platform-api-5ddcb955c9-49jmh 1/1 Running 0 106s
speech-platform-assets-798475fd5-8d6sz 1/1 Running 0 106s
speech-platform-billing-76d4c4b498-6pmpf 0/1 CrashLoopBackOff 3 (35s ago) 107s
speech-platform-configurator-59857b7b56-lmccs 0/1 CreateContainerConfigError 0 110s
speech-platform-docs-7966875976-w6mxr 1/1 Running 0 108s
speech-platform-envoy-86c6dd6897-6k47q 1/1 Running 0 106s
speech-platform-frontend-79d4fb9dd-b6m4l 1/1 Running 0 109s
speech-platform-postgresql-0 2/2 Running 0 86s
speech-platform-restapigateway-7c5f9477d9-tpt6r 1/1 Running 0 110s
voice-activity-detection-77c7cd884d-9xngh 0/1 CreateContainerConfigError 0 111s
voiceprint-comparison-f9d95d859-ftzzb 0/1 CreateContainerConfigError 0 105s
voiceprint-extraction-6d499f9dd-zqcq8 0/1 CreateContainerConfigError 0 104s
Enhanced-speech-to-text-built-on-whisper, language-identification, speaker-diarization, speech-platform-configurator, voice-activity-detection, voiceprint-comparison and voiceprint-extraction microservices (pods) are failing initially. This is expected and it is caused by a missing license. You can either add a license to the microservices or disable them if you don't plan to use them.
Optionally, you can check if all other system and auxiliary applications are running:
kubectl get pods -A
All pods should be running or completed, like this:
[root@speech-platform ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-8d98546c4-9pq8p 1/1 Running 0 6m44s
kube-system coredns-94bcd45cb-rp6zx 1/1 Running 0 6m44s
kube-system metrics-server-754ff994c9-pczpx 1/1 Running 0 6m44s
kube-system svclb-ingress-nginx-controller-baed713a-nzwcc 2/2 Running 0 5m24s
kube-system helm-install-ingress-nginx-wpwk4 0/1 Completed 0 6m45s
kube-system helm-install-filebrowser-fd569 0/1 Completed 0 6m45s
kube-system helm-install-nginx-28rll 0/1 Completed 0 6m45s
kube-system helm-install-speech-platform-7k6qf 0/1 Completed 0 6m45s
ingress-nginx ingress-nginx-controller-679f97c77d-rdssr 1/1 Running 0 5m24s
nginx nginx-6ddd78f789-f9lq2 1/1 Running 0 5m39s
filebrowser filebrowser-7476f7c65c-rk9d5 1/1 Running 0 5m39s
gpu nfd-58s4x 2/2 Running 0 5m44s
speech-platform speech-platform-docs-57dcd49f9f-q97w4 1/1 Running 0 5m38s
speech-platform speech-platform-envoy-759c9b49d9-99vp7 1/1 Running 0 5m38s
speech-platform speech-platform-frontend-7f4566dbc6-jhprh 1/1 Running 0 5m38s
speech-platform speech-platform-assets-5697b4c86-8sh9k 1/1 Running 0 5m37s
speech-platform media-conversion-7d8f884f9-zh75g 1/1 Running 0 5m37s
speech-platform speech-platform-api-69bc7d4d5b-6kv7x 1/1 Running 0 5m37s
speech-platform voiceprint-extraction-68d646d449-9br8m 0/1 CrashLoopBackOff 5 (2m33s ago) 5m38s
speech-platform enhanced-speech-to-text-built-on-whisper-74548494c866mrz 0/1 CrashLoopBackOff 5 (2m32s ago) 5m38s
speech-platform voiceprint-comparison-76948b4947-xjw92 0/1 CrashLoopBackOff 5 (2m20s ago) 5m38s
Application check
Access virtual appliance welcome page on virtual appliance to see IP address or hostname from your local computer. If you are able to access the welcome page, applications should work.
Components
This is the list of components virtual appliance is composed of.
Operating system
There is Rocky Linux 9.3 under the hood.
GPU support
Virtual appliance has all necessary prerequisites pre-baked to allow run GPU-powered workloads (especially enhanced-speech-to-text-built-on-whisper). This means that NVIDIA drivers and container toolkit are already installed. Also GPU time-based sharing is enabled by default which means you can run multiple technologies on single GPU simultaneously.
Kubernetes
There is k3s kubernetes distribution deployed inside.
Ingress controller
We use ingress-nginx ingress controller. This component is serving as reverse proxy and loadbalancer.
Speech platform
This is the application for solving various voice-related problems like speaker identification, speech-to-text transcription and many more. Speech platform is accessible via web browser or API.
File Browser
File Browser is web-based file browser/editor used to work with data on data disk.
Prometheus
Prometheus is a tool for providing monitoring information about kubernetes components.
Grafana
Grafana is a tool for visualization of prometheus metrics.
Disks
Virtual appliance comes with system disk and data disk.
System disk
Operating system is installed on system disk. You should not modify system disk unless you know what you are doing.
List of component stored on system disk:
- NVIDIA drivers
- Container images for microservices
- Packaged helm charts
Data disk
Data disk is used as persistent storage. Unlike system disk, data disk is
intended to contain files which can be viewed/modified by the user. Data disk is
created with PHXDATADISK
label and system is instructed to mount filesystem
with this label to /data
directory.
List of components stored on data disk:
- Logs (
/data/logs
) of the system, k3s and individual containers - Configuration for ingress controller
(
/data/ingress-nginx/ingress-nginx-values.yaml
) - Configuration for speech platform
(
/data/speech-platform/speech-platform-values.yaml
) - Models for individual microservices (
/data/models/
) - Custom images (
/data/images/
) - Prometheus persistent storage (
/data/storage/prometheus
)
Configuration
Following section describes various configuration use cases.
VirtualBox configuration
Linux deployment
If you use VirtualBox to run Virtual Appliance on Linux distributions, you can use our installation script to import and configure Virtual Appliance. To obtain this script, contact Phonexia support. They will provide you with this script. To use it, you must have already downloaded a bundle with the files of the Virtual Appliance itself and a bundle with models and licenses. To run this script, the following steps need to be done:
- Open terminal and locate the script.
- Make the script an executable:
$ chmod +x SpeechPlatformInstaller.sh
- Run the script using the following command:
$ SpeechPlatformInstaller.sh -m /path/to/models_bundle -v /path/to/VA_bundle -n virtual_machine_name
- Wait until the script finishes. When it does, it displays a link to the Speech Platform application.
Windows deployment
If you use VirutalBox on Windows you can use our installer, which will import and configure Virtual Appliance for you. To use this application you will need an archive with Virtual Appliance and an archive with licensed models. When you have these archives, run the app and fill out name of the virtual machine, as hypervisor select VirtualBox, select paths to archives and click install. New window will pop up and show you installation process. After the installation is complete you can access speech platform at http://localhost:1080/app/home
Hyper-V configuration
Supported Hyper-V versions
If you use Hyper-V as your hypervisor we provide configuration files for importing Virtual Appliance. There are however few prerequisites. Firstly you need to check which Hyper-V versions your system supports. You can do this by opening PowerShell as admin and running the following command:
Get-VMHostSupportedVersion
Now you can see which versions of Hyper-V are supported by your system. We ship
configuration file for version 8.0
.
Automatic configuration
If you are using Hyper-V as your hypervisor and use Windows 11, you can use our installer to import and configure Virtual Appliance for you. If you want to use this installer, you need to have Microsoft Virtual Machine Converter installed on your machine, you can download it from here. It is used for converting the virtual hard disks from vmdk format to Microsoft's VHDX format. You will also need to have archives with virtual appliance and with licensed models. After you obtain these, you can launch the installer, enter VM name, select Hyper-V as hypervisor and fill out paths to archives. Then click install and new window will pop up showing you the progress of installation. After it finishes you can access the speech platform on http://localhost:1080/app/home.
Manual configuration
If you want to configure VA manually please follow the steps described below.
Disk conversion
Next you need to convert provided virtual appliance disks from .vmdk
to
.vhdx
format. You can use something like Starwind V2V or Microsoft Virtual
Machine Converter. With Microsoft Virtual Machine Converter after installing it
open PowerShell as admin and type in the following commands:
- Import the PowerShell module
Import-Module 'C:\Program Files\Microsoft Virtual Machine Converter\MvmcCmdlet.psd1'
- Convert the disks
ConvertTo-MvmcVirtualHardDisk -SourceLiteralPath <path/to/vmdk> -VhdType DynamicHardDisk -VhdFormat vhdx -DestinationLiteralPath <path/to/target/folder>
Once the disks are converted you need to achieve proper folder structure for Hyper-V. Do this by moving those converted Virtual Hard Disks to their folder as shown below.
speech-platform-virtual-appliance
├── Virtual Hard Disks
| ├── speech-platform-disk0001.vhdx
| └── speech-platform-disk0002.vhdx
└── Virtual Machines
├── <MachineID>.vmcx
└── <MachineID>.vmrs
Networking configuration
Next step is configuring the networking for Virtual Appliance. To set this you need to create a Network Address Translation, a Hyper-V virtual switch and set up port forwarding.
- Create Virtual Switch
New-VMSwitch -Name "SpeechPlatformSwitch" -SwitchType Internal
- Add Switch Address
New-NetIPAddress -IPAddress 192.168.100.1 -PrefixLength 24 -InterfaceAlias "vEthernet (SpeechPlatformSwitch)"
- Create NAT
New-NetNAT -Name "SpeechPlatformNAT" -InternalIPInterfaceAddressPrefix 192.168.100.0/24
- Set up NAT port forwarding
Add-NetNatStatickMapping -NatName "SpeechPlatformNAT" -Protocol TCP -ExternalIPAddress 0.0.0.0 -ExternalPort 1080 -InternalIPAddress 192.168.100.2 -InternalPort 80
Add-NetNatStatickMapping -NatName "SpeechPlatformNAT" -Protocol TCP -ExternalIPAddress 0.0.0.0 -ExternalPort 2222 -InternalIPAddress 192.168.100.2 -InternalPort 22
Importing Virtual Appliance
When you are done with configuring the networking you can use the Hyper-V Manager UI to import Virtual Appliance. The virtual switch will be automatically detected and attached to the VA. As a last step you will need to set static IP address in Virtual appliance. There are two ways to configure this.
-
The first and simplest way is using a cloud-init configuration. In the directory that contains the configuration files for Hyper-V, there is also the file
seed.iso
. When you open VM settings in Hyper-V selectIDE Controller 0
and add DVD Drive. Select the provided image file and selectApply
. Now that you have attached this iso image cloud-init will automatically detect it and set its IP address with the same configuration as you would with the previous configuration. -
Second is starting Virtual Appliance and connecting to it using the Hyper-V Virtual Machine Connection. After logging in run following commands:
$ nmcli con add type ethernet con-name eth0 ifname eth0 ipv4.addresses 192.168.100.2/24 ipv4.gateway 192.168.100.1 ipv4.dns "8.8.8.8 8.8.4.4" ipv4.method manual
$ nmcli con up eth0After executing these commands reboot Virtual Appliance.
Upload microservices models and licenses
Virtual appliance is distributed without licenses and only with default models. To get other models and licenses, contact Phonexia support. They will provide a bundle (.zip file) with models and licenses. Bundle then need to be uploaded and unzipped inside the virtual appliance. To upload bundle with models and licenses, these steps need to be done:
- Upload provided
licensed-models.zip
archive to virtual appliance via filebrowser or via scp:$ scp -P <virtual-appliance-port> licensed-models.zip root@<virtual-appliance-ip>:/data/
- Connect to the virtual appliance
/data
folder:$ ssh root@<virtual-appliance-ip> -p <virtual-appliance-port>
$ cd /data - Unzip archive. Model is extracted to directory per language:
$ unzip licensed-models.zip
- Check that the configuration is valid and successfully applied.
The bundle content has a specific structure that ensures all models and licenses are placed in the correct locations after unzipping.
Changing microservice models
In case you use models other than the default, you need to change the values of
paths in /data/speech-platform/speech-platform-licenses.yaml
file:
<microservice>.config.model.file
value leading to model, and<microservice>.config.license.key
value leading to the license for used model.
Example (change model large_v2-1.0.1
to small-1.0.1
for
enhanced-speech-to-text-built-on-whisper
microservice):
enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "large_v2-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "large_v2-1.0.1"
needs to be changed to:
enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "small-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "small-1.0.1"
These changes are required for all microservices with licensed models except
speech-to-text-phonexia
, time-analysis
and audio-quality-estimation
.
Inspect microservices models
Models are stored inside the /data/models
folder, where path to each model is
constructed as:
/data/models/<technology_name>/<model_name>-<model_version>.model
Where:
- technology_name - is name of the technology, e.g.
speaker_identification
- model_name - is name of the model, e.g.
xl
- model_version - is version of a model, e.g.
5.0.0
Imported models can be inspected after the uploading by following command:
- Content of the
/data/models
:$ find /data/models
/data/models/
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/xl-5.0.0-license.txt
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1.model
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt
/data/models/speech_to_text_phonexia
/data/models/speech_to_text_phonexia/en_us_6-3.62.0-license.txt
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
/data/models/time_analysis
/data/models/time_analysis/generic-3.62.0-license.txt
/data/models/time_analysis/generic-3.62.0.model
Inspect microservices licenses
Licenses are stored in path
/data/speech-platform/speech-platform-licenses.yaml
. File contains the
Kubernetes secrets definition of a licenses which ensures the simple loading of
licenses to the application.
Imported licenses can be inspected after the uploading by following command:
- Content of the
/data/speech-platform
folder:$ find /data/speech-platform/
/data/speech-platform/
/data/speech-platform/speech-platform-licenses.yaml
/data/speech-platform/speech-platform-values.yaml
Kubernetes secret definitions in a file are separated by ---
. Each secret
contains the contents of the file on the .stringData.license
path
corresponding to the technology for which the license is meant. For example:
- For model of technology
speaker_identification
with namexl
and version5.0.0
, the secret will look like this:
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
Content of a license file
(/data/speech-platform/speech-platform-licenses.yaml
) can be shown by
following command:
- Content of the license file:
$ cat /data/speech-platform/speech-platform-licenses.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
name: enhanced-speech-to-text-built-on-whisper-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt" file>
type: Opaque
.
.
.
Set DNS name for speech platform virtual appliance
Speech platform is accessible on http://<IP_address_of_virtual_appliance>
. We
recommend to create DNS record to make access more comfortable for users.
Consult your DNS provider to get more information how to add corresponding DNS
record.
Use HTTPS certificate
Speech platform is also accessible via HTTPS protocol on
https://<IP_address_of_virtual_appliance>
. If you prefer secure communication
you might need to use your own TLS certificate for securing the communication.
To do so, follow this guide:
- Prepare the TLS certificate beforehand.
- Put certificate private key in file named
cert.key
. - Put certificate into file named
cert.crt
. - Create kubernetes secret manifest storing the certificate and private key:
kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml
- Copy manifest (resulting file) to
/data/ingress-nginx/certificate-server.yaml
. - Open text file
/data/ingress-nginx/ingress-nginx-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.controller.extraArgs.default-ssl-certificate
- Uncomment the line.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
<Not significant lines omitted>
extraArgs:
<Not significant lines omitted>
default-ssl-certificate: "ingress-nginx/default-ssl-certificate" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Extend disks
Disks are extended automatically on VM startup by growfs
systemd service when
you extend the backing volume/disk in the hypervisor. You can trigger the
extension manually by running the script
/root/grow-partition-and-filesystems.sh
. It grows partition and filesystem for
both system and data disks.
Disable unneeded microservices
Virtual appliance comes with all microservices enabled by default. You may decide to disable microservice if you do not plan to use it. Disabled microservice does not consume any compute resources.
- Find out which microservices you want to disable -
enhanced-speech-to-text-built-on-whisper
,language-identification
,speaker-diarization
,voice-activity-detection
,voiceprint-comparison
orvoiceprint-extraction
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.<microservice>.enabled
- Change the value from
true
tofalse
. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
enabled: false - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
- Check that the configuration is valid and successfully applied.
Phonexia speech to text microservice
This section describes configuration specific to phonexia speech to text microservice.
Permanent vs onDemand instances
Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.
All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
.
.
. - Delete onDemand key and its subkeys.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
.
.
. - Save the file. 1.The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Configure languages in speech-to-text-phonexia microservice
This microservice consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.
Note: Docker images for any language are not included in the virtual appliance. This means that virtual appliance needs to access the internet to download the docker image when speech-to-text-phonexia microservice is used! As a workaround you can put custom image into virtual appliance.
By default all languages/instances are enabled. List of languages:
- ar_kw_6
- ar_xl_6
- bn_6
- cs_cz_6
- de_de_6
- en_us_6
- es_6
- fa_6
- fr_fr_6
- hr_hr_6
- hu_hu_6
- it_it_6
- ka_ge_6
- kk_kz_6
- nl_6
- pl_pl_6
- ps_6
- ru_ru_6
- sk_sk_6
- sr_rs_6
- sv_se_6
- tr_tr_6
- uk_ua_6
- vi_vn_6
- zh_cn_6
How to disable all language instances except of cs_cz_6 and en_us_6:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ark
imageTag: 3.62.0-stt-ar_kw_6
onDemand:
enabled: true
- name: arx
imageTag: 3.62.0-stt-ar_xl_6
onDemand:
enabled: true
- name: bn
imageTag: 3.62.0-stt-bn_6
onDemand:
enabled: true
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
- name: de
imageTag: 3.62.0-stt-de_de_6
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
- name: vi
imageTag: 3.62.0-stt-vi_vn_6
onDemand:
enabled: true
- name: zh
imageTag: 3.62.0-stt-zh_cn_6
onDemand:
enabled: true - Comment out all the instances except (cs_cz_6 and en_us_6).
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
#- name: ark
# imageTag: 3.62.0-stt-ar_kw_6
# onDemand:
# enabled: true
#- name: arx
# imageTag: 3.62.0-stt-ar_xl_6
# onDemand:
# enabled: true
#- name: bn
# imageTag: 3.62.0-stt-bn_6
# onDemand:
# enabled: true
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
#- name: de
# imageTag: 3.62.0-stt-de_de_6
# onDemand:
# enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
#- name: vi
# imageTag: 3.62.0-stt-vi_vn_6
# onDemand:
# enabled: true
#- name: zh
# imageTag: 3.62.0-stt-zh_cn_6
# onDemand:
# enabled: true - Or you can even delete the instances you are not interested in.
- Then updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Modify replicas for permanent language instances
Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.
Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 32GB memory)! Note: OnDemand instance has always only one replica.
- Find out which language instance you want to configure replicas for.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount
. - Change the value to desired amount of replicas.
- Updated file should look like:
- Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
replicaCount: 2 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Modify parallelism for instances
Each instance is able to process only one request at the time, unless the
parallelism is overridden. Value of parallelism means the maximum number of
requests processed by one instance. Parallelism is set globally for all
instances of technology, however each instance can override the value. To
override parallelism for speech-to-text-phonexia
, time-analysis
, or
audio-quality-estimation
these steps needs to be followed:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Find the key, depending on technology (
speech-to-text-phonexia
,time-analysis
,audio-quality-estimation
) for which parallelism should be overridden:.spec.valuesContent.<technology>.parallelism
- Change the value to desired number of requests processed in parallel
- Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
# Global value of parallelism for all instances
parallelism: 2
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0
- name: en
imageTag: 3.62.0
# Override of parallelism for en instance
parallelism: 4 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Add custom images
This section describes how to add custom images to a virtual appliance. Typical use case is to add speech to text images to Speech Engine for languages you want to use or to add a GPU-powered image for Voiceprint Extraction.
Add language images for speech to text phonexia
This subsection focuses on adding Phonexia Speech to Text images to the Speech Engine for the languages you want to use. These images need to be added to the data disk in order for Phonexia Speech to Text to work offline. In the example we will add two images: Phonexia Speech to Text for English and Czech languages.
- [Virtual appliance] Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from the within the virtual appliance or via a file browser. - [Virtual appliance] Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - [Virtual appliance] Choose which images you want to add. Use
imageTag
key to find out which image tag(s) to use:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6 <- This is the image tag
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6 <- This is the image tag
onDemand:
enabled: true - [PC] Pull all images:
docker pull phonexia/spe:3.62.0-stt-en_us_6
docker pull phonexia/spe:3.62.0-stt-cs_cz_6 - [PC] Save all images to single tar archive:
docker save -o images.tar phonexia/spe:3.62.0-stt-cs_cz_6 phonexia/spe:3.62.0-stt-en_us_6
- [PC] Copy
images.tar
file into virtual appliance via ssh or filebrowser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
- [Virtual appliance] Restart virtual appliance to load the images or load them
manually with:
ctr image import /data/images/images.tar
Add gpu-powered image for voiceprint-extraction
This section describes how to add and use a GPU-powered image for Voiceprint Extraction.
-
Identify which Voiceprint Extraction image from dockerhub you want to use. If you are not sure, then use the latest
gpu
image tag. In this example, we will use the1.2.0-gpu
image tag. -
[Virtual appliance] Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from within the virtual appliance or via a file browser. -
[Virtual appliance] Locate the key
.spec.valuesContent.voiceprint-extraction.image
. -
[Virtual appliance] Configure Voiceprint Extraction to use the image:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
image:
repository: phonexia/voiceprint-extraction
tag: 1.2.0-gpu
registry: docker.io -
If you don't mind downloading the image from the internet (dockerhub) you are good to go. Otherwise, you need to upload the image to the virtual appliance.
-
[PC] Pull the Voiceprint Extraction image:
docker pull phonexia/voiceprint-extraction:1.2.0-gpu
-
[PC] Save all images to a single tar archive:
docker save -o images.tar phonexia/voiceprint-extraction:1.2.0-gpu
-
[PC] Copy the
images.tar
file into the virtual appliance via SSH or file browser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
-
[Virtual appliance] Restart the virtual appliance to load the images or load them manually with:
ctr image import /data/images/images.tar
Add gpu-powered image for language-identification
This section describes how to add and use a GPU-powered image for Language Identification.
-
Identify which Language Identification from dockerhub you want to use. If you are not sure, then use the latest
gpu
image tag. In this example, we will use the1.2.0-gpu
image tag. -
[Virtual appliance] Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from within the virtual appliance or via a file browser. -
[Virtual appliance] Locate the key
.spec.valuesContent.language-identification.image
. -
[Virtual appliance] Configure Language Identification use the image:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
language-identification:
<Not significant lines omitted>
image:
repository: phonexia/language-identification
tag: 1.2.0-gpu
registry: docker.io -
If you don't mind downloading the image from the internet (dockerhub) you are good to go. Otherwise, you need to upload the image to the virtual appliance.
-
[PC] Pull the Language Identification image:
docker pull phonexia/language-identification:1.2.0-gpu
-
[PC] Save all images to a single tar archive:
docker save -o images.tar phonexia/language-identification:1.2.0-gpu
-
[PC] Copy the
images.tar
file into the virtual appliance via SSH or file browser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
-
[Virtual appliance] Restart the virtual appliance to load the images or load them manually with:
ctr image import /data/images/images.tar
Add gpu-powered image for speaker-diarization
This section describes how to add and use a GPU-powered image for Speaker Diarization.
-
Identify which Speaker Diarization from dockerhub you want to use. If you are not sure, then use the latest
gpu
image tag. In this example, we will use the1.2.0-gpu
image tag. -
[Virtual appliance] Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from within the virtual appliance or via a file browser. -
[Virtual appliance] Locate the key
.spec.valuesContent.speaker-diarization.image
. -
[Virtual appliance] Configure Speaker Diarization use the image:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speaker-diarization:
<Not significant lines omitted>
image:
repository: phonexia/speaker-diarization
tag: 1.2.0-gpu
registry: docker.io -
If you don't mind downloading the image from the internet (dockerhub) you are good to go. Otherwise, you need to upload the image to the virtual appliance.
-
[PC] Pull the Speaker Diarization image:
docker pull phonexia/speaker-diarization:1.2.0-gpu
-
[PC] Save all images to a single tar archive:
docker save -o images.tar phonexia/speaker-diarization:1.2.0-gpu
-
[PC] Copy the
images.tar
file into the virtual appliance via SSH or file browser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
-
[Virtual appliance] Restart the virtual appliance to load the images or load them manually with:
ctr image import /data/images/images.tar
Modify microservice replicas
Each microservice has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding microservices.
Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 32GB memory)!
- Find out which microservices you want to modify replicas -
enhanced-speech-to-text-built-on-whisper
,language-identification
,speaker-diarization
,voice-activity-detection
,voiceprint-comparison
orvoiceprint-extraction
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.<microservice>.replicaCount
- Change the value to desired amount of replicas.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
replicaCount: 2 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Run enhanced-speech-to-text-built-on-whisper microservice on GPU
At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi
to list all the devices. If device is present and visible to the system, then
output should look like:
[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the enhanced-speech-to-text-built-on-whisper to use GPU for the processing.
-
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. -
Locate enhanced-speech-to-text-built-on-whisper section
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper
. -
Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force whisper to run on GPU
device: cuda -
Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
# Uncomment this to run whisper on GPU
runtimeClassName: "nvidia" -
Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.updateStrategy
. -
Set
type
toRecreate
to allow seamless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
# Uncomment this to allow seamless updates on single GPU machine
updateStrategy:
type: Recreate -
Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
-
Check that the configuration is valid and successfully applied.
Run Voiceprint Extraction microservice on GPU
Note: GPU-powered image for Voiceprint Extraction is not included in the virtual appliance. Follow this guide to add and use the image.
First, make sure the virtual appliance can detect the GPU device(s). Use
nvidia-smi
to list all the devices. If the device is present and visible to
the system, then the output should look like:
[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the Voiceprint Extraction to use GPU for the processing.
-
Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from within the virtual appliance or via a file browser. -
Locate the Voiceprint Extraction section
.spec.valuesContent.voiceprint-extraction
. -
Locate the key
.spec.valuesContent.voiceprint-extraction.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force voiceprint-extraction to run on GPU
device: cuda -
Locate the key
.spec.valuesContent.voiceprint-extraction.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key the
.spec.valuesContent.voiceprint-extraction.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
# Uncomment this to run voiceprint-extraction on GPU
runtimeClassName: "nvidia" -
Locate the key
.spec.valuesContent.voiceprint-extraction.updateStrategy
. -
Set
type
toRecreate
to allow seamless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
# Uncomment this to allow seamless updates on single GPU machine
updateStrategy:
type: Recreate -
The updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file
-
The application automatically recognizes that the file was updated and redeploys itself with the updated configuration.
Run Language Identification on GPU
Note: GPU-powered image for Language Identification is not included in the virtual appliance. Follow this guide to add and use the image.
First, make sure the virtual appliance can detect the GPU device(s). Use
nvidia-smi
to list all the devices. If the device is present and visible to
the system, then the output should look like:
[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the Language Identification use GPU for the processing.
-
Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from within the virtual appliance or via a file browser. -
Locate the Language Identification section
.spec.valuesContent.language-identification
. -
Locate the key
.spec.valuesContent.language-identification.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
language-identification:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force language-identification to run on GPU
device: cuda -
Locate the key
.spec.valuesContent.language-identification.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
language-identification:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key the
.spec.valuesContent.language-identification.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
language-identification:
<Not significant lines omitted>
# Uncomment this to run language-identification on GPU
runtimeClassName: "nvidia" -
Locate the key
.spec.valuesContent.language-identification.updateStrategy
. -
Set
type
toRecreate
to allow seamless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
language-identification:
<Not significant lines omitted>
# Uncomment this to allow seamless updates on single GPU machine
updateStrategy:
type: Recreate -
The updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
language-identification:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file
-
The application automatically recognizes that the file was updated and redeploys itself with the updated configuration.
Run Speaker Diarization on GPU
Note: GPU-powered image for Speaker Diarization is not included in the virtual appliance. Follow this guide to add and use the image.
First, make sure the virtual appliance can detect the GPU device(s). Use
nvidia-smi
to list all the devices. If the device is present and visible to
the system, then the output should look like:
[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the Speaker Diarization use GPU for the processing.
-
Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from within the virtual appliance or via a file browser. -
Locate the Speaker Diarization section
.spec.valuesContent.speaker-diarization
. -
Locate the key
.spec.valuesContent.speaker-diarization.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speaker-diarization:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force speaker-diarization to run on GPU
device: cuda -
Locate the key
.spec.valuesContent.speaker-diarization.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speaker-diarization:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key the
.spec.valuesContent.speaker-diarization.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speaker-diarization:
<Not significant lines omitted>
# Uncomment this to run speaker-diarization on GPU
runtimeClassName: "nvidia" -
Locate the key
.spec.valuesContent.speaker-diarization.updateStrategy
. -
Set
type
toRecreate
to allow seamless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speaker-diarization:
<Not significant lines omitted>
# Uncomment this to allow seamless updates on single GPU machine
updateStrategy:
type: Recreate -
The updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speaker-diarization:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file
-
The application automatically recognizes that the file was updated and redeploys itself with the updated configuration.
GPU parallelism settings
This section describes how to control processing parallelism when microservice is running on GPU. Following configuration applies only to voiceprint-extraction and enhanced-speech-to-text-built-on-whisper:
<microservice>
config:
# -- Parallel tasks per device. GPU only.
instancesPerDevice: 1
# -- Index of device to use. GPU only.
#deviceIndex: 0
There are two configuration options:
instancesPerDevice
- Controls how many tasks can be processed by a microservice on single GPU in parallel. Higher value means higher GPU utilization (both processor- and memory-wise).deviceIndex
- Controls which GPU card to use in case there are multiple GPU cards. We usually discourage to use this in most cases.
Change model used in a microservice
Each microservice needs a model to do its job properly. We provide more models for some microservices, for example enhanced-speech-to-text-built-on-whisper. Usually we pre-configure microservices with the most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.
License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.
Change model in enhanced-speech-to-text-built-on-whisper microservice
We offer following models for enhanced-speech-to-text-built-on-whisper microservice:
large-v3
- next-gen most accurate multilingual model.large-v2
- most accurate multilingual model. This is the default model.medium
- less accurate but faster thanlarge-v2
.base
- less accurate but faster thanmedium
.small
- less accurate but faster thanbase
.
- Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.
- Upload archive to virtual appliance.
$ scp licensed-models.zip root@<virtual-appliance-ip>:/data/
- Unzip archive. Models are extracted to directory per microservice:
$ unzip licensed-models.zip
- Content of the
/data/models
should look like:$ find /data/models
/data/models/
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/small-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.0.0-license.key.txt
/data/models/enhanced_speech_to_text_built_on_whisper/base-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.0.0-license.key.txt
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model
- Change content of the
file
key from"large_v2-1.0.0.model"
to file you've just uploaded ("small-1.0.0.model"
). - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
model:
<Not significant lines omitted>
file: "small-1.0.0.model" - Change the license because you have changed the model. See above how to do it.
- Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk
To keep up with the latest version of application, load models from virtual appliance volume is possible. For using the image without the model and load existing models from data volume, instance in config file need to be setup as follows:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: en
imageTag: 3.62.0
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
. . .
As a default we count with that the model will be located on path
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. This folder
structure is ensured by unzipping provided licensed-models.zip
archive in
/models/
path. Additionally if the path to the model is different, or the
version of model is not matching with the image, it can be specified in
instances config as a:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: cs
imageTag: 3.62.0
model:
hostPath: /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
model:
hostPath: /data/models/time_analysis/generic-3.62.0.model
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
model:
hostPath: /data/models/audio_quality_estimation/generic-3.62.0.model
. . .
So far model loading from data disk is supported only by the Speech to Text Phonexia and Time Analysis technologies.
Process patented audio codecs with media-conversion
By default media conversion can work only with patent-free audio codecs.
We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.
Pull Media Conversion image on the fly
This is handy if you don't mind pulling images from the internet. Image is pulled only if it is not present yet.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.image
- Change content of the
repository
,registry
,tag
andtagSuffix
tomedia-conversion:
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied.
Put Media Conversion image into virtual appliance
This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.
- [PC] Pull media-conversion image locally:
$ docker pull phonexia/media-conversion:1.0.0
- [PC] Save Media Conversion image to tar archive:
$ docker save --output images.tar phonexia/media-conversion:1.0.0
- [PC] Copy
images.tar
file into virtual appliance via ssh or filebrowser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
- [Virtual appliance] Restart virtual appliance to load the images or load them
manually with:
ctr image import /data/images/images.tar
- Reconfigure the speech-platform to use locally downloaded image as mentioned above.
Disable DNS resolving for specific domains
The Kubernetes resolver tries to resolve non-FQDN names with all domains from
/etc/resolv.conf
. This might cause issues if access to the upstream DNS server
(taken from /etc/resolv.conf
as well) is denied. To avoid this issue,
configure the Kubernetes resolver to skip lookup for specific domain(s).
- [Virtual appliance] Create file
/data/speech-platform/coredns-custom.yaml
manually with following content. Replace<domain1.com>
and<domain2.com>
for the domain you want to disable lookup for:apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
<domain1.com>:53 {
log
}
<domain2.com>:53 {
log
} - [Virtual appliance] The file should look like this:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
locadomain:53 {
log
}
example.com:53 {
log
} - [Virtual appliance] Restart coreDNS to apply the change:
kubectl -n kube-system rollout restart deploy/coredns
- [Virtual appliance] Check that coreDNS pod is running:
kubectl -n kube-system get pods -l k8s-app=kube-dns
Custom configuration with cloud-init
Cloud-init is a widely used tool for configuring cloud instances at boot time. And we support it in Virtual Appliance.
It can be used for customizing the Virtual Appliance - to create a user with specific SSH key, install extra packages and so on.
How to Pass Cloud-Init User Configuration to Virtual Appliance
This guide will walk you through the steps required to pass a cloud-init user configuration to a Virtual Appliance.
-
The first step is to create a user-data file that contains the configuration information you want to pass to the VM. This file is typically written in YAML and may include various configurations, such as creating users, setting up SSH keys, or running commands. Here is an example of a basic
user-data
file:#cloud-config
users:
- name: phonexia
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr... your_public_key_here
packages:
- htopSave this file as
user-data.yaml
. -
Since non-cloud hypervisors like VirtualBox and VMWare does not have a native method to pass cloud-init data, you need to create a "seed" ISO image that contains your
user-data.yaml
file. Cloud-init will read this data during the virtual machine boot process.You can create an ISO image using the
cloud-localds
command:cloud-localds seed.iso user-data.yaml
This command generates an ISO file named
seed.iso
containing youruser-data.yaml
and generated meta-data file. -
Attach the ISO Image to the Virtual Appliance VM
Next, attach the
seed.iso
file to the VM as a CD-ROM/DVD-ROM. You can do this via the VirtualBox GUI or VMWare vSphere or ESXi Host Client: -
Boot the VM
Cloud-init will automatically detect the attached ISO image and apply the configurations specified in your
user-data.yaml
file. -
Verify Cloud-Init Execution
Once the VM has booted, you can verify that cloud-init has applied the configuration correctly. Connect to your VM via SSH or the console and check the following:
-
Check Cloud-Init Status:
cloud-init status
-
Check that htop package is installed:
htop
This should open htop application.
-
Check that you can login as phonexia user with ssh key:
ssh -i <path_to_ssh_private_key> user@<ip of virtual appliance>
-
Check Cloud-Init Logs: Cloud-init logs its activities in
/var/log/cloud-init.log
and/var/log/cloud-init-output.log
. You can inspect these logs to troubleshoot any issues:less /var/log/cloud-init.log
-
-
(Optional) Detach the ISO Image
Usually you no longer need the
seed.iso
file attached to your VM, you can detach it in a similar way as you attached it.
Uninstall NVIDIA Drivers
Virtual Appliance contains NVIDIA drivers needed for GPU processing. In some cases it might be handy to use different version of the drivers or use different kind of drivers (vGPU) instead. As a first step, current drivers must be uninstalled.
Run following command to uninstall the bundled drivers:
dnf module remove nvidia-driver:550
Note that GPU processing won't work until new drivers are installed. Installation of the new drivers is out of the scope of this document.
Limits
This section describe what are virtual appliance limits and how to modify them.
API limits
Following limits are applied for the API itself.
Name | Unit | Default | Description |
---|---|---|---|
taskExpirationTime | seconds | 300 | Time when finished tasks are expired. API holds the information about finished tasks (both successfully finished and failed). These information are discarded after taskExpirationTime . Client usually polls on the task id. Client must retrieve the task status before it is expired. Maximum value is 3600 . |
taskGrpcTimeout | seconds | 120 | Maximum time API waits for any task to complete. If you process big audio files, you probably need to increase this limit. |
inputStorageSize | variable | 1GiB | Size of the input storage. When audio file is POSTed to the API, whole file must be stored on the disk. If you process big files or multiple files in parallel, then this limit must be probably increased. |
internalStorageSize | variable | 1GiB | Size of the internal storage. Each audiofile is converted into wav format before processing. Converted audio is stored on the disk. If you process big files or multiple files in parallel, then this limit must be probably increased. Also note the internalStorageSize must be greater or equal to the inputStorageSize . |
singleFileUploadTimeout | second | 120 | Maximum allowed time for uploading single file to the API. If you process big files or having a poor network connection, then this limit must be increased. |
singleFileUploadSize | bytes | 104857600 (== 100MB) | Maximum allowed size of an audio file to upload. If you process big files then this limit must be increased. Note that this API/ingress limit not the UI limit! |
How to change the API limits
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.api.config
- Change the value of the corresponding limit to a new value.
api:
config:
taskExpirationTime: 1200 - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
api:
<Not significant lines omitted>
config:
taskExpirationTime: 1200 - Save the file.
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
- Check that the configuration is valid and successfully applied.
Media Conversion limits
Following limits are applied for the Media Conversion technology.
Name | Unit | Default | Description |
---|---|---|---|
maxAudioLength | seconds | 7200 | Audio length limit. Processing of media files larger than this limit is rejected. |
How to change the Media Conversion limits
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.config
- Change the value of the corresponding limit to a new value.
media-conversion:
config:
# 10 hours
maxAudioLength: 36000 - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
config:
maxAudioLength: 36000 - Save the file.
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
- Check that the configuration is valid and successfully applied.
UI limits
Following limits are applied for the UI itself.
Name | Unit | Default | Description |
---|---|---|---|
taskParallelism | 4 | UI post task to the API and polls for the task until it is finished. This controls how many tasks can be processed in parallel. | |
taskPollingInterval | seconds | 1 | Duration between poll attempts. |
taskPollingTimeout | seconds | 3600 | How long the UI polls for the task. How long is the UI willing to wait until the task is finished. |
Speaker Identification UI limits
Limits in config section
.spec.valuesContent.frontend.config.limits.speakerIdentification
are
applicable only for speaker identification.
Name | Unit | Default | Description |
---|---|---|---|
maxFileSize | bytes | 5000000 (== 5MB) | Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the singleFileUploadSize API limit. |
maxFilesCount | 100 | Maximum number of files to be uploaded. | |
maxVoiceRecorderDuration | seconds | 300 | Maximum duration of the record captured by voice recorder. |
Speech to text UI limits
Limits in config section
.spec.valuesContent.frontend.config.limits.speechToText
are applicable only
for Speech to Text. Limits are applicable for both Enhanced Speech to Text built
on Whisper and Speech to Text by Phonexia 6th Gen.
Name | Unit | Default | Description |
---|---|---|---|
maxFileSize | bytes | 5000000 (== 5MB) | Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the singleFileUploadSize API limit. |
maxFilesCount | 100 | Maximum number of files to be uploaded. | |
maxVoiceRecorderDuration | seconds | 300 | Maximum duration of the record captured by voice recorder. |
Language Identification UI limits
Limits in config section
.spec.valuesContent.frontend.config.limits.languageIdentification
are
applicable only for language identification.
Name | Unit | Default | Description |
---|---|---|---|
maxFileSize | bytes | 5000000 (== 5MB) | Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the singleFileUploadSize API limit. |
maxFilesCount | 100 | Maximum number of files to be uploaded. | |
maxVoiceRecorderDuration | seconds | 300 | Maximum duration of the record captured by voice recorder. |
How to change the UI limits
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.frontend.config.limits
- Change the value of the corresponding limit to a new value.
frontend:
config:
limits:
taskParallelism: 2 - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
frontend:
<Not significant lines omitted>
config:
limits:
taskParallelism: 2 - Save the file.
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
- Check that the configuration is valid and successfully applied.
Pod count limits
Currently, the platform is limited by the number of pods that can be created
inside the Kubernetes cluster. The maximum number of pods is set to 300
.
How to change the pod count limits
Pod count limits can be overridden by editing the /etc/rancher/k3s/config.yaml
file. To override the maximal number of pods max-pods
parameter needs to be
added/edited. Example:
debug: true
system-default-registry: airgapped.phonexia.com
disable:
- traefik
- cloud-controller
kubelet-arg:
- "kube-reserved=cpu=500m,memory=1Gi,ephemeral-storage=2Gi"
- "system-reserved=cpu=500m, memory=1Gi,ephemeral-storage=2Gi"
- "eviction-hard=memory.available<500Mi,nodefs.available<10%"
- "max-pods=350"
After editing configuration, virtual machine needs to be restarted (stop and start) to apply changes.
Admin backends limits
Following limits are applied to admin backends (filebrowser, grafana, prometheus).
Name | Unit | Default | Description |
---|---|---|---|
singleFileUploadTimeout | seconds | 120 | Maximum allowed time for uploading. |
singleFileUploadSize | bytes | 5368709120 (== 5GB) | Maximum allowed size of an audio file to upload. |
How to change admin backends limits
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.ingressAdmin
- Change the value of the corresponding limit to a new value.
ingressAdmin:
singleFileUploadTimeout: 300 - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
ingressAdmin:
<Not significant lines omitted>
singleFileUploadTimeout: 300 - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
- Check that the configuration is valid and successfully applied.
GPU sharing limits
This limits number of pods which can share single GPU.
Name | Unit | Default | Description |
---|---|---|---|
replicas | count | 3 | Number of pods sharing single GPU |
How to change GPU sharing limits
-
Open text file
/data/speech-platform/nvidia-device-plugin-configs.yaml
either directly from inside virtual appliance or via file browser. -
Locate key
.data.default.sharing.timeSlicing.resources.replicas
-
Change the value of
replicas
key to a new value.replicas: 6
-
Updated file should look like:
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin-configs
namespace: nvidia-device-plugin
data:
default: |-
<Not significant lines omitted>
resources:
- name: nvidia.com/gpu
replicas: 6 -
Save the file
-
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
How to disable GPU sharing
In some cases it might be handy to disable GPU sharing:
-
Open text file
/data/speech-platform/nvidia-device-plugin-configs.yaml
either directly from inside virtual appliance or via file browser. -
Locate key
.data.default.sharing
. -
Delete all content under
.data.default.sharing
key. -
Updated file should look like:
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin-configs
namespace: nvidia-device-plugin
data:
default: |-
version: v1
sharing:
Admin console
Admin console is a simple web page containing links to various admin-related
tools. Console is located at http://<IP_of_virtual_appliance>/admin
. It
contains links to
- filebrowser
- prometheus
- grafana
Grafana
Grafana is tool for visualizing application and kubernetes metrics. List of most useful dashboards available in the grafana:
- Envoy Clusters - See envoy cluster statistics
- Kubernetes / Compute Resources / Pod - See resource consumption of individual pods
- NGINX Ingress controller - See ingress controller stats
- NVIDIA DCGM Exporter Dashboard - See GPU device stats
- Node Exporter / Nodes - See stats about virtual appliance
- Speech Platform API capacity - See metrics about speech platform itself
Troubleshooting
This section contains information about individual components of the speech platform and request flow
Speech platform components
List of the components:
- frontend - simple webserver serving static html, css, javascript and image files
- docs - simple webserver serving documentation
- assets - simple webserver hosting examples
- api - python component providing REST API interface
- envoy - router and loadbalancer for GRPC messages
- media-conversion - python component used for ** converting audio files from various formats to simple wav format ** splitting multi-channel audio into multiple single-channel files
- technology microservices ** enhanced-speech-to-text-built-on-whisper - transcribes speech to text ** speech-to-text-phonexia - transcribes speech to text ** voiceprint-extraction - extracts voiceprint from audio file ** voiceprint-comparison - compares multiple voiceprints ** language-identification - identify language in audio
Request flow
- User POST request (for example transcribe speech to text) to API.
- API creates task for processing and output task id to the user.
- From this point user can poll on the task to get the result.
- API calls media-conversion via envoy.
- Media conversion converts the audiofile to wav format and possibly splits it into multiple mono-channel files.
- API gets converted audiofile from media-conversion.
- API calls enhanced-speech-to-text-built-on-whisper via envoy.
- Enhanced-speech-to-text-built-on-whisper transcribes the audiofile.
- API gets the transcription.
- User can retrieve the task result.
Check node status
Check node status with:
[root@speech-platform ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
speech-platform.localdomain Ready control-plane,master 9s v1.27.6+k3s1
If node is not in ready state, there is usually something wrong.
Note: Node list can be empty (No resources found
) or node can be in notReady
state if virtual appliance is starting up. This is normal and should be fixed in
a few moments.
Also node has to have enough free disk and memory capacity. When this is not true, pressure events are emitted. Run following command to see the node conditions:
[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 08:06:45 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletReady kubelet is posting ready status
Disk pressure
Disk pressure node event is emitted, when kubernetes is running out of disk
capacity in the /var
filesystem. Node conditions looks like this:
[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 08:06:45 +0000 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletReady kubelet is posting ready status
Follow the procedure for extending the disks.
Memory pressure
Memory pressure node event is emitted, when kubernetes is running out of free memory. Node conditions looks like this:
[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure True Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:50:50 +0000 KubeletHasInsufficientMemory kubelet has insufficient memory available
DiskPressure False Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:33:08 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:33:08 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:33:08 +0000 KubeletReady kubelet is posting ready status
You need to grant more memory to the virtual appliance or disable unneeded microservices.
View pod logs
Logs are stored in /data/log/pods/
or in /data/logs/containers
. You can view
them via filebrowser if needed.
Alternatively you can display logs with kubectl
command:
[root@speech-platform ~]# kubectl -n speech-platform logs -f voiceprint-extraction-7867578b97-w7bzd
[2024-04-29 08:59:10.250] [Configuration] [info] model: /models/xl-5.0.0.model
[2024-04-29 08:59:10.250] [Configuration] [info] port: 8080
[2024-04-29 08:59:10.250] [Configuration] [info] device: cpu
[2024-04-29 08:59:10.250] [critical] base64_decode: invalid character ''<''
Changes in configuration are not applied
Changes in the main configuration file
/data/speech-platform/speech-platform-values.yaml
are automatically picked up
and applied by the helm controller. If configuration is not valid (or to be more
precise - if the configuration file is not valid YAML file), the helm controller
fails to apply the configuration. The helm controller creates a one-time job to
update the helm chart with the new configuration. If the configuration is
incorrect, the job will not complete successfully, and the underlying pod will
either restart or be in an error state. The pod status will reflect this issue:
[root@speech-platform disks]# kubectl get pods -n kube-system | grep -i helm-install
helm-install-filebrowser-2b7pn 0/1 Completed 0 51m
helm-install-ingress-nginx-m87d4 0/1 Completed 0 51m
helm-install-nginx-nrcvk 0/1 Completed 0 51m
helm-install-dcgm-exporter-fjqzz 0/1 Completed 0 51m
helm-install-kube-prometheus-stack-jn5bz 0/1 Completed 0 51m
helm-install-keda-vsn95 0/1 Completed 0 51m
helm-install-speech-platform-9l9vj 0/1 Error 4 (46s ago) 6m15s
View logs of failed helm-install pod:
[root@speech-platform disks]# kubectl logs -f helm-install-speech-platform-9l9vj -n kube-system
...
...
...
Upgrading speech-platform
+ helm_v3 upgrade --namespace speech-platform speech-platform https://10.43.0.1:443/static/phonexia-charts/speech-platform-0.0.0-36638f5-helm.tgz --values /config/values-10_HelmChartConfig.yaml
Error: failed to parse /config/values-10_HelmChartConfig.yaml: error converting YAML to JSON: yaml: line 494: could not find expected ':'
Check configuration file validity
This section describes how to check if your configuration is valid and how to identify which line in the configuration is incorrect.
Use following command to check if the configuration file is valid:
yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
If the configuration file is valid, the content of the file will be printed. Otherwise, the line number with an error will be printed out as follows:
[root@speech-platform ~]# yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
Error: bad file '-': yaml: line 253: could not find expected ':'
Content of the file 10 lines before and 10 lines after line 253:
[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 253 -B 10 -A 10
243 # -- List of devices to use. GPU only.
244 # deviceIndices: [0,1]
245
246 # Uncomment this to force whisper to run on GPU
247 device: cuda
248
249 logLevel: debug
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value:
260 "eyJ2ZX...=="
261
262 # Uncomment this to grant access to GPU on whisper pod
263 resources:
There is nothing suspicious on the line 253. In fact, the line number reported
by yq
might be slightly off because the configuration of speech-platform helm
chart itself is stored as a value of the spec.valuesContent
key in the
speech-platform-values.yaml
file. Therefore, you need to add number 7
(sincespec.valuesContent
is on the 7th line in the configuration file) to the
error line number to get the correct line number (== 260):
[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value:
260 "eyJ2ZX...=="
261
262 # Uncomment this to grant access to GPU on whisper pod
263 resources:
264 limits:
265 nvidia.com/gpu: "1"
266
267 # Uncomment this to run whisper on GPU
268 runtimeClassName: "nvidia"
269
270 service:
There is only a license key on line 260. Error message
could not find expected ':'
which is right because there is no :
on this
line. One line above (259) there is a key named value
which should contain the
license. However, the license itself is on line 260, making this file invalid
(i.e., it is not in a valid YAML format). To fix it, simply merge lines 259
and 260. The resulting file should look like this:
[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value: "eyJ2ZX...=="
260
261 # Uncomment this to grant access to GPU on whisper pod
262 resources:
263 limits:
264 nvidia.com/gpu: "1"
265
266 # Uncomment this to run whisper on GPU
267 runtimeClassName: "nvidia"
268
269 service:
270 clusterIP: "None"
Disable DNS resolving for specific domains
Check coreDNS logs at first:
kubectl -n kube-system logs -l k8s-app=kube-dns
Following lines in the logs indicate this issue:
2024-06-05T11:00:49.55751974Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:60352->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.546562499Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:40254->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.548101103Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:47838->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.558720939Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:39526->192.168.137.1:53: i/o timeout
2024-06-05T11:00:53.547326187Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:58487->192.168.137.1:53: i/o timeout
2024-06-05T11:00:53.548836432Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:46303->192.168.137.1:53: i/o timeout
This happens when DHCP is used for IP address assignment for the virtual
appliance which usually configures nameserver and search domains in
/etc/resolv.conf
:
nameserver 192.168.137.1
search localdomain
Communication within virtual appliance does not use FQDN, which means that each
DNS name is resolved with all domains. Internal kubernetes domains
(<namespace>.svc.cluster.local
, svc.cluster.local
and cluster.local
) are
resolved immediately with coreDNS, non-kubernetes domains are resolved with
nameserver provided by DHCP. If access to the nameserver is blocked (for
example, by firewall), then resolving of single name can take up to 10 seconds,
which can significantly increase task processing duration.
To avoid this issue, you can either allow communication from virtual appliance to DHCP-configured DNS server or configure kubernetes resolver to skip lookup for DHCP-provided domain(s):
- [Virtual appliance] Create file
/data/speech-platform/coredns-custom.yaml
manually with following content. Replace<domain1.com>
and<domain2.com>
for domain you want to disable lookup for:apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
<domain1.com>:53 {
log
}
<domain2.com>:53 {
log
} - [Virtual appliance] File looks like:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
locadomain:53 {
log
}
example.com:53 {
log
} - [Virtual appliance] Restart coreDNS to apply the change:
kubectl -n kube-system rollout restart deploy/coredns
- [Virtual appliance] Check that coreDNS pod is running:
kubectl -n kube-system get pods -l k8s-app=kube-dns
Diagnostics report tool
Diagnostics script is part of virtual appliance. The script is designed to gather system information and application information for troubleshooting.
The script collects following information:
- CPU, RAM and disk usage.
- System logs, application logs and event history.
- Information about kubernetes objects.
Create the diagnostics report
- Connect to the virtual appliance :
$ ssh root@<virtual-appliance-ip>
- Run diagnostics script:
$ /root/run-diag-report.sh
- Script gathers all the information and store them in the zip archive. This
file is stored in the
/data/reports
directory.
Upgrade guide
This section describes manual steps which need to be done prior upgrading. There are various changes in the configuration which must be reflected before upgrade. We suggest to always use configuration file bundled with new version of the virtual appliance and update it to suit your needs (insert licenses, enable/disable service, set replicas, ...). If you are not willing to do this, then you must modify your current configuration file to work with new version of the virtual appliance.
This section describes how to perform upgrade of the virtual appliance.
Upgrade and retain data disk
This upgrade approach retains all the data and configuration stored on the data disk.
Pros:
- No need to configure virtual appliance from scratch
- Prometheus metrics are kept
Cons:
- You have to do version-specific upgrade steps
- Import new version of virtual appliance (version X+1) into your virtualization platform
- Stop current version of virtual appliance (version X)
- Detach data disk from current version of virtual appliance (version X)
- Attach data disk to new version of virtual appliance (version X+1)
- Start new version of virtual appliance (version X+1)
- Delete old version of virtual appliance (version X)
- Follow version-specific upgrade steps
Upgrade and discard data disk
This upgrade approach discard current data disk and uses new one.
Pros:
- Easier upgrade procedure
- No version-specific upgrade steps
- No accumulated disarray on the data disk
Cons:
- You have to configure virtual appliance from the scratch:
- Disable unneeded services
- Insert license keys
- Insert models
- Import new version of virtual appliance (version X+1) into your virtualization platform
- Stop current version of virtual appliance (version X)
- Start new version of virtual appliance (version X+1)
- Delete old version of virtual appliance (version X)
- Configure virtual appliance from scratch
Upgrade to 3.4.0
This section describes the manual steps which need to be done prior to upgrading to 3.4.0.
Add configuration for Audio Quality Estimation
Audio Quality Estimation is being added in this release. Therefore it must be configured properly.
Add configuration for Voice Activity Detection
Voice Activity Detection is being added in this release. Therefore it must be configured properly.
Step by step upgrade guide to 3.4.0
This section describes how to upgrade virtual appliance from 3.3.0 to 3.4.0 with retaining data disk content.
- Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. - Put following content in the end of the file:
# Audio Quality Estimation sub-chart
audio-quality-estimation:
enabled: false
parallelism: 1
grpcAdapter:
image:
registry: airgapped.phonexia.com
config:
license:
useSecret: true
secret: audio-quality-estimation-license
key: grpc-adapter-license
image:
registry: airgapped.phonexia.com
# Set defaults for onDemand instances
onDemand:
trigger:
activationThreshold: "0.9"
query: |
'
service_running_tasks{
namespace="{{ $.Release.Namespace }}",
exported_service="time_analysis"
}
+
service_waiting_tasks{
namespace="{{ $.Release.Namespace }}",
exported_service="time_analysis"
}
'
config:
license:
useSecret: true
secret: audio-quality-estimation-license
key: license
instances:
- name: sqe
imageTag: 3.62.0
onDemand:
enabled: true
annotations:
secret.reloader.stakater.com/reload: "audio-quality-estimation-license"
service:
clusterIP: "None"
# Voice Activity Detection subchart config
voice-activity-detection:
enabled: true
replicaCount: 1
image:
repository: phonexia/dev/technologies/microservices/voice-activity-detection/main
registry: airgapped.phonexia.com
config:
# Set logging level
logLevel: debug
# Uncomment this to force voice-activity-detection to run on GPU
#device: cuda
model:
volume:
hostPath:
path: /data/models/voice_activity_detection
# Name of a model file inside the volume, for example "generic-3.0.0.model"
file: "generic-3.0.0.model"
license:
useSecret: true
secret: voice-activity-detection-license
key: "generic-3.0.0"
annotations:
secret.reloader.stakater.com/reload: "voice-activity-detection-license"
# Uncomment this to grant access to GPU for voice-activity-detection pod
#resources:
# limits:
# nvidia.com/gpu: "1"
# Uncomment this to run voice-activity-detection on GPU
#runtimeClassName: "nvidia"
service:
clusterIP: "None"
#updateStrategy:
# type: Recreate
- Update the configurator annotation
.spec.valuesContent.configurator.annotations."secret.reloader.stakater.com/reload"
:
# Configurator component
configurator:
annotations:
secret.reloader.stakater.com/reload: >-
audio-quality-estimation-license,
audio-quality-estimation-license-extensions,
enhanced-speech-to-text-built-on-whisper-license,
enhanced-speech-to-text-built-on-whisper-license-extensions,
language-identification-license,
language-identification-license-extensions, speaker-diarization-license,
speaker-diarization-license-extensions,
speaker-identification-license, speaker-identification-license-extensions,
speech-to-text-phonexia-license,
speech-to-text-phonexia-license-extensions, time-analysis-license,
time-analysis-license-extensions, voice-activity-detection-license,
voice-activity-detection-license-extensions
Upgrade to 3.3.0
This section describes the manual steps which need to be done prior to upgrading to 3.3.0.
Upgrade Speech to Text Phonexia and Time Analysis
Both Speech to Text Phonexia and Time Analysis microservice were updated to 3.62. You need to reflect this change in the values file.
Add configuration to Configurator service
Configurator service is being used in this release. Therefore it must be configured properly.
Deploy additional components
New technology Speaker Diarization was added. Configuration section must be added before using this technology.
Step by step upgrade guide to 3.3.0
This section describes how to upgrade virtual appliance from 3.2.0 to 3.3.0 with retaining data disk content.
-
Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. -
Put following content in the end of the file:
# speaker-diarization subchart config
speaker-diarization:
enabled: true
replicaCount: 1
image:
repository: phonexia/dev/technologies/microservices/speaker-diarization/main
registry: airgapped.phonexia.com
# Extra environment variables
extraEnvVars: []
config:
# Uncomment this to force speaker-diarization to run on GPU
#device: cuda
model:
volume:
hostPath:
path: /data/models/speaker_diarization
# Name of a model file inside the volume, for example "xl-5.0.0.model"
file: "xl-5.0.0.model"
license:
useSecret: true
secret: speaker-diarization-license
key: "xl-5.0.0"
annotations:
secret.reloader.stakater.com/reload: "speaker-diarization-license"
# Uncomment this to grant access to GPU for speaker-diarization pod
#resources:
# limits:
# nvidia.com/gpu: "1"
# Uncomment this to run speaker-diarization on GPU
#runtimeClassName: "nvidia"
service:
clusterIP: "None"
#updateStrategy:
#type: Recreate
# Configurator component
configurator:
enabled: true
image:
registry: airgapped.phonexia.com
annotations:
secret.reloader.stakater.com/reload: >-
enhanced-speech-to-text-built-on-whisper-license,
enhanced-speech-to-text-built-on-whisper-license-extensions,
language-identification-license,
language-identification-license-extensions,
speaker-diarization-license, speaker-diarization-license-extensions,
speaker-identification-license,
speaker-identification-license-extensions,
speech-to-text-phonexia-license,
speech-to-text-phonexia-license-extensions, time-analysis-license,
time-analysis-license-extensions -
Locate
.spec.valuesContent.time-analysis.config.instances
-
Change
imageTag
version to3.62.0
. -
Section then looks like:
# Time-analysis subchart
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: tae
imageTag: 3.62.0
onDemand:
enabled: true -
Locate
.spec.valuesContent.speech-to-text-phonexia.config.instances
-
Change
imageTag
version to3.62.0
. -
Section then looks like:
# Speech-to-text-phonexia subchart
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ar-kw
imageTag: 3.62.0
onDemand:
enabled: true
- name: ar-xl
imageTag: 3.62.0
onDemand:
enabled: true
- name: bn
imageTag: 3.62.0
onDemand:
enabled: true
.
.
. -
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
-
Check that the configuration is valid and successfully applied.
Upgrade to 3.2.0
This section describes the manual steps which need to be done prior to upgrading to 3.2.0.
Add grpcAdapter license configuration for Time-Analysis and Speech-to-Text-Phonexia
Speech Engine microservices now require additional license. The license is deployed automatically from model package but license configuration must be added.
Create configuration for GPU sharing
GPU sharing is enabled by default but it does not work until configuration is created.
Deploy additional components
New technology Language Identification was added. Configuration section must be added before using this technology.
Step by step upgrade guide to 3.2.0
This section describes how to upgrade virtual appliance from 3.1.0 to 3.2.0 with retaining data disk content.
-
Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. -
Put following content in the end of the file:
# language-identification subchart config
language-identification:
enabled: true
replicaCount: 1
image:
repository: phonexia/dev/technologies/microservices/language-identification/main
registry: airgapped.phonexia.com
# Extra environment variables
extraEnvVars: []
config:
# Uncomment this to force language-identification to run on GPU
#device: cuda
model:
volume:
hostPath:
path: /data/models/language_identification
# Name of a model file inside the volume, for example "xl-5.1.0.model"
file: "xl-5.2.0.model"
license:
useSecret: true
secret: language-identification-license
key: "xl-5.2.0"
annotations:
secret.reloader.stakater.com/reload: "language-identification-license"
# Uncomment this to grant access to GPU for language-identification pod
#resources:
# limits:
# nvidia.com/gpu: "1"
# Uncomment this to run language-identification on GPU
#runtimeClassName: "nvidia"
service:
clusterIP: "None"
#updateStrategy:
#type: Recreate -
Locate
.spec.valuesContent.time-analysis.grpcAdapter
-
Append config section:
config:
license:
useSecret: true
secret: time-analysis-license
key: grpc-adapter-license -
Section then looks like:
# Time-analysis subchart
time-analysis:
<Not significant lines omitted>
grpcAdapter:
<Not significant lines omitted>
config:
license:
useSecret: true
secret: time-analysis-license
key: grpc-adapter-license -
Locate
.spec.valuesContent.speech-to-text-phonexia.grpcAdapter
-
Append config section:
config:
license:
useSecret: true
secret: speech-to-text-phonexia-license
key: grpc-adapter-license -
Section then looks like:
# Speech-to-text-phonexia subchart
speech-to-text-phonexia:
<Not significant lines omitted>
grpcAdapter:
<Not significant lines omitted>
config:
license:
useSecret: true
secret: speech-to-text-phonexia-license
key: grpc-adapter-license -
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
-
Check that the configuration is valid and successfully applied.
-
Create new text file
/data/speech-platform/nvidia-device-plugin-configs.yaml
either directly from inside the virtual appliance or via a file browser with following content:apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin-configs
namespace: nvidia-device-plugin
data:
default: |-
version: v1
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 3 -
Save the file.
-
GPU sharing will be configured in a while.
Upgrade to 3.1.0
This section describes the manual steps which need to be done prior to upgrading to 3.1.0.
Change license secret field for Time-Analysis and Speech-to-Text-Phonexia
To ensure unification of loading secrets for all microservices field the way how to load license from secret was changed in Time-Analysis and Speech-to-Text-Phonexia microservices. This will simplify the user experience with the loading licenses.
Upload licenses from secret
Way how the licenses are uploaded to the virtual appliance has simplified. From
now on the licenses are imported from models and licenses bundle (.zip
file)
provided by Phonexia, which after the
unzipping loads the licenses and
models automatically. This require the configuration change, however the old way
is still working.
Deploy additional components
Billing feature is mature enough to be part of the virtual appliance. To deploy billing related components, add following section to the configuration:
billing:
enabled: true
image:
registry: airgapped.phonexia.com
restApiGateway:
image:
registry: airgapped.phonexia.com
enabled: true
postgresql:
enabled: true
auth:
postgresPassword: postgresPassword
image:
registry: airgapped.phonexia.com
metrics:
enabled: true
image:
registry: airgapped.phonexia.com
serviceMonitor:
enabled: true
primary:
persistence:
storageClass: manual
selector:
matchLabels:
app.kubernetes.io/name: postgresql
Step by step upgrade guide to 3.1.0
This section describes how to upgrade virtual appliance from 3.0.0 to 3.1.0 with retaining data disk content.
IF YOU ARE ALREADY LOADING LICENSE TROUGH SECRET:
- Rename the field loading the Speech-to-Text-Phonexia and Time-Analysis licenses
- Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. - Locate
.spec.valuesContent.<speech-to-text-phonexia OR time-analysis>.config.license
- Change it from:
To:
license:
existingSecret: <secret-name>license:
useSecret: true
secret: <secret-name>
key: <secret-license-key>
IF YOU WANT TO LOAD LICENSES FROM SECRETS:
- Load licenses from secret files
- Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. - Locate
.spec.valuesContent.<microservice>.config.license
.<microservice>
are all the services requiring license (voiceprint-comparison, voiceprint-extraction, enhanced-speech-to-text-built-on-whisper, speech-to-text-phonexia, time-analysis). - Change it from:
To:
license:
value: "<license>"license:
useSecret: true
secret: "<microservice>-license"
key: "<model_name>_<model_version>"
Example:
license:
useSecret: true
secret: "enhanced-speech-to-text-built-on-whisper"
key: "small-1.0.1"
-
Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. -
Locate
.spec.valuesContent.envoy
-
Put following content before envoy section:
billing:
enabled: true
image:
registry: airgapped.phonexia.com
restApiGateway:
image:
registry: airgapped.phonexia.com
enabled: true
postgresql:
enabled: true
auth:
postgresPassword: postgresPassword
image:
registry: airgapped.phonexia.com
metrics:
enabled: true
image:
registry: airgapped.phonexia.com
serviceMonitor:
enabled: true
primary:
persistence:
storageClass: manual
selector:
matchLabels:
app.kubernetes.io/name: postgresql -
Section then looks like this:
serviceMonitor:
enabled: true
additionalLabels:
release: kube-prometheus-stack
billing:
enabled: true
image:
registry: airgapped.phonexia.com
restApiGateway:
image:
registry: airgapped.phonexia.com
enabled: true
postgresql:
enabled: true
auth:
postgresPassword: postgresPassword
image:
registry: airgapped.phonexia.com
metrics:
enabled: true
image:
registry: airgapped.phonexia.com
serviceMonitor:
enabled: true
primary:
persistence:
storageClass: manual
selector:
matchLabels:
app.kubernetes.io/name: postgresql
envoy:
enabled: true -
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
-
Check that the configuration is valid and successfully applied.
Upgrade to 3.0.0
This section describes the manual steps which need to be done prior to upgrading to 3.0.0.
Rename Whisper microservice
Due to licensing reasons we had to rename the speech-to-text-whisper-enhanced microservice. The new name is enhanced-speech-to-text-built-on-whisper. This change must be reflected in the values file.
Step by step upgrade guide to 3.0.0
This section describes how to upgrade virtual appliance from 2.1.0 to 3.0.0 with retaining data disk content.
- Rename whisper microservice in currently running version of virtual appliance.
- Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside the virtual appliance or via a file browser. - Locate
.spec.valuesContent.speech-to-text-whisper-enhanced
. - Replace all occurences of
speech-to-text-whisper-enhanced
withenhanced-speech-to-text-built-on-whisper
. - Replace all occurences of
speech_to_text_whisper_enhanced
withenhanced_speech_to_text_built_on_whisper
. - The updated file should look like this:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
image:
repository: phonexia/dev/technologies/microservices/enhanced-speech-to-text-built-on-whisper/main
<Not significant lines omitted>
config:
<Not significant lines omitted>
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
- Save the file
- Rename the directory with Whisper models with the following command:
mv /data/models/speech_to_text_whisper_enhanced /data/models/enhanced_speech_to_text_built_on_whisper
- Import new version of virtual appliance (version X+1) into your virtualization platform
- Stop current version of virtual appliance (version X)
- Detach data disk from current version of virtual appliance (version X)
- Attach data disk to new version of virtual appliance (version X+1)
- Start new version of virtual appliance (version X+1)
- Delete old version of virtual appliance (version X)
Upgrade to 2.1.0
This section describes manual steps which need to be done prior upgrading to 2.1.0.
Load Speech to Text Phonexia and Time Analysis model from data disk instead of image
In new version, default way how to load models for Speech to Text Phonexia and Time Analysis will change. Before, models were loaded from image which lead to lot of duplicity in images. From now on, we will consider loading models from data disk as a default. However, the old way of loading models from image will still work.
Upgrade to load models from data disk (/data/models)
require to update speech
platform values file:
- Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser - Locate
.spec.valuesContent.speech-to-text-phonexia.config.instances
or.spec.valuesContent.time-analysis.config.instances
key - Define versions of images (
imageTag
) without model (e.g. 3.62.0) - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ar-kw
imageTag: 3.62.0
onDemand:
enabled: true
- name: ar-kx
imageTag: 3.62.0
onDemand:
enabled: true
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: tae
imageTag: 3.62.0
onDemand:
enabled: true
- Locate
.spec.valuesContent.speech-to-text-phonexia.image
or.spec.valuesContent.time-analysis.image
key to uncomment the image section. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
image:
registry: airgapped.phonexia.com
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
image:
registry: airgapped.phonexia.com
<Not significant lines omitted>
- Save the file
Add ingressAdmin section
- Open the text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate the key
.spec.valuesContent.ingress.extraBackends
- Remove the
extraBackends
scope with all of its contents - Add new
ingressAdmin
scope on the same indentation as theingress
scope. The resulting file should look like this:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
ingress:
<Not significant lines omitted>
ingressAdmin:
enabled: true
annotations: {}
singleFileUploadSize: "5368709120"
singleFileUploadTimeout: 120
<Not significant lines omitted>
- Save the file
- Proceed with upgrade
Fix permission for prometheus storage
This is post-upgrade task. Must be run when virtual appliance is upgraded to 2.1.0.
- Run following command in the virtual appliance to fix permissions of the prometheus storage:
$ chmod -R a+w /data/storage/prometheus/prometheus-db/
Upgrade to 2.0.0
This section describes manual steps which need to be done prior upgrading to 2.0.0.
Rename speech-engine subchart to speech-to-text-phonexia
Due to renaming speech-engine subchart you have to update speech platform values file before upgrading:
- Open the new text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-engine
. - Rename
speech-engine
tospeech-to-text-phonexia
. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted> - Save the file
Rename speech-to-text-phonexia instances
- Open the new text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Remove
stt-
prefix from the name of each instance. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ar-kw
imageTag: 3.62.0-stt-ar_kw_6
onDemand:
enabled: true
- name: ar-kx
imageTag: 3.62.0-stt-ar_xl_6
onDemand:
enabled: true - Save the file
Add proper tag suffix for Media Conversion
- Open the new text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.image
. - Change the value of the
tagSuffix
key to-free
. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
image:
<Not significant lines omitted>
tagSuffix: "-free"
<Not significant lines omitted> - Save the file
- Proceed with upgrade
Update path to models
Default model location was changed from /data/models
to
/data/models/<microservice>
. If you plan to upgrade and keep current data
disk, no steps are needed. Model are loaded from old location which is
/data/models
. If you plan to upgrade from scratch (discarding the current data
disk), no steps are needed as well - models are loaded from new location which
is /data/models/<microservice>
.
How to modify OVF to Hyper-V compatible VM
- Both of existing virtual HDDs (.vmdk) need to be converted to Hyper-V compatible HDDs (.vhdx). Do it through this program: Starwind V2V Converter.
- Create new VM in Hyper-V.
- IMPORTANT: Use Generation 1 VM - Generation 2 doesn't work.
- Enable networking/make sure it is enabled.
- OPTIONAL: Disable options like DVD drive or SCSI controller since they are not needed.
- Set Memory to at least 32GB and CPUs to at least 8 cores.
- Attach HDDs, preferably onto one IDE controller.
- Start the VM.
- After it starts, check IP address either printed out on a login screen. Wait for the entire engine to start.
- Go to the IP from the previous step and verify that the entire VM works as it should.
Load balancing
The performance of a single instance of the virtual appliance is of course limited by the HW resources and by the number of concurrent tasks the API component can handle. To work around these limitations, we can advise you to deploy multiple instances of the virtual appliance and put a load balancer before them.
How the load balancer works
The load balancer (LB) must ensure that the requests for the same task are routed to the same instance of the virtual appliance. This is called a stateful session. It can be achieved with a session cookie or with a session header.
The request flow is then following:
- The client POSTs a task to the LB.
- The LB picks a virtual appliance instance (depending on an LB algorithm) and sends the request there.
- The API in the virtual appliance accepts the task and sends a response back to the LB.
- The LB adds a session cookie or a session header to the response and sends it back to the client.
- The client extracts the task id and the session cookie or session header from the response.
- The client polls for the task. It sends a GET request with the session cookie or session header to the LB.
- The LB routes the request to the proper instance of the virtual appliance based on the session cookie or session header.
In the following example, have used Envoy as the load balancer. Any other load balancer can be used if it supports stateful sessions.
Envoy configuration
This is the example Envoy configuration:
static_resources:
listeners:
- address:
socket_address:
# Load balancer address and port
# This is where Envoy accepts the incoming traffic
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
log_format:
text_format_source:
inline_string: >
[%START_TIME%] "%REQ(:METHOD)%
%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
%RESPONSE_CODE% %RESPONSE_FLAGS%
%RESPONSE_CODE_DETAILS%
%UPSTREAM_REQUEST_ATTEMPT_COUNT% %BYTES_RECEIVED%
%BYTES_SENT% %DURATION%
%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%
"%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%"
"%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%"
"%UPSTREAM_HOST%" "%REQ(REQUEST-ID)%"
"%REQ(CORRELATION-ID)%" "%REQ(session-header)%"
"%RESP(session-header)%"
codec_type: AUTO
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/api/"
route:
cluster: speech-platform-virtual-appliance
retry_policy:
retry_on: "retriable-status-codes"
# Retry request on a different upstream when the 429 response is received
# This should happen when POSTing a request/task but max concurrent tasks limit is reached
# This ensures that task is accepted in the other (== less busy) instance of the virtual appliance
retriable_status_codes:
- 429
# How many times is the request retried
# Should be # of virtual appliance instances minus 1
num_retries: 1
http_filters:
- name: envoy.filters.http.stateful_session
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.stateful_session.v3.StatefulSession
strict: true
session_state:
name: envoy.http.stateful_session.header
typed_config:
"@type": type.googleapis.com/envoy.extensions.http.stateful_session.header.v3.HeaderBasedSessionState
# Name of the session header
# Contains base64 encoded upstream_address:port
# This tells Envoy to which upstream server it should send the request
name: session-header
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: speech-platform-virtual-appliance
connect_timeout: 0.5s
type: STATIC
dns_lookup_family: V4_ONLY
lb_policy: RANDOM
load_assignment:
cluster_name: speech-platform-virtual-appliance
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
# IP address of the first instance of the virtual appliance
address: 1.2.3.4
# Port of the first instance of the virtual appliance
port_value: 80
- endpoint:
address:
socket_address:
# IP address of the second instance of the virtual appliance
address: 1.2.3.5
# Port of the second instance of the virtual appliance
port_value: 80
health_checks:
- timeout: 2s
interval: 60s
interval_jitter: 1s
unhealthy_threshold: 3
healthy_threshold: 3
http_health_check:
# Healthcheck uri of the speech api inside virtual appliance
path: /api/system/status
# Admin interface for looking at things
admin:
address:
socket_address:
address: 0.0.0.0
port_value: 9090
Access API with LB
Here is an example script to show how to work with a header-based stateful session using curl:
#!/bin/bash
# URL of the virtual appliance or load balancer
platform_url=http://localhost:8080
# URI to POST the task to
uri="/api/technology/speech-to-text?language=en"
# Path to audio file for processing
voice_file=/tmp/audio.wav
# Proccess this many tasks in parallel
parallel=100
# End when this many tasks are processed
total_tasks=400
# Post tasks to the API so that we still have $parallel tasks running
post_tasks() {
local count=$1
local tmpfile_task=/tmp/task.${task_counter}.json
local tmpfile_headers=/tmp/headers.${task_counter}.txt
for i in $(seq 1 $count); do
# POST single task
echo "[${task_counter}] Task $i of ${count}"
curl \
-L -s -X POST \
-H 'Content-Type: multipart/form-data' \
-H 'Accept: application/json' \
-F file=@"${voice_file}" \
--output ${tmpfile_task} \
--dump-header ${tmpfile_headers} \
"${platform_url}${uri}"
rv=$?
# Parse session header
session_header=$(grep session-header ${tmpfile_headers} | cut -d ':' -f 2)
echo "[${task_counter}] Curl response code is: ${rv}"
echo "[${task_counter}] Session header is: ${session_header}"
echo "[${task_counter}] $(cat ${tmpfile_task})"
task_id=$(jq -r '.task.task_id' ${tmpfile_task})
# Store task id
current_tasks+=($task_id)
# Store session header for each task id
taskToHeader["${task_id}"]="${session_header}"
task_counter=$((${task_counter} +1))
done
}
# Poll for all running tasks
poll_tasks() {
local counter_done=0
local counter_rejected=0
local counter_running=0
local counter_pending=0
local counter_unknown=0
# Poll status of each task
for task_id in ${current_tasks[@]}; do
local tmpfile_task=/tmp/task-id-${task_id}
# Add session header
curl -s --header "session-header:${taskToHeader[${task_id}]}" -L -o ${tmpfile_task} "${platform_url}/api/task/${task_id}"
rv=$?
echo "[${task_id}] Curl response code is: ${rv}"
task_status=$(jq -r '.state' ${tmpfile_task})
echo "[${task_id}] Task is still ${task_status}..."
# Evaluate task status
case $task_status in
pending)
counter_pending=$((${counter_pending} +1))
;;
running)
counter_running=$((${counter_running} +1))
;;
rejected)
counter_rejected=$((${counter_rejected} +1))
;;
done)
counter_done=$((${counter_done} +1))
counter_total_done=$((${counter_total_done} +1))
finished_tasks+=(${task_id})
;;
*)
counter_unknown=$((${counter_unknown} +1))
;;
esac
done
echo "Summary: Done: ${counter_done}, Rejected: ${counter_rejected}, Running: ${counter_running}, Pending: ${counter_pending}, Unknown: ${counter_unknown}"
}
rm -f /tmp/task-id-*
if [ ! -f ${voice_file} ]; then
echo "Voicefile does not exists!"
exit 1
fi
current_tasks=()
declare -A taskToHeader
task_counter=1
start_time=$(date '+%s')
counter_total_done=0
# Control loop
while true; do
finished_tasks=()
poll_tasks
# Remove finished tasks
for del in ${finished_tasks[@]}
do
current_tasks=(${current_tasks[@]/$del})
done
echo "Task counter: ${task_counter}, Finished tasks: ${counter_total_done}"
if [ ${counter_total_done} -ge ${total_tasks} ]; then
echo "Reached ${total_tasks} finished tasks."
echo "Start time: ${start_time}"
end_time=$(date '+%s')
echo "End time: ${end_time}"
echo "Duration: $(( ${end_time} - ${start_time} ))"
echo "Voicefile: ${voice_file}"
echo "task parallelism: ${parallel}"
break
fi
# POST tasks to have $parallel tasks running all the time
if [ ${#current_tasks[@]} -le $parallel ]; then
post_tasks $(($parallel - ${#current_tasks[@]}))
fi
sleep 2
done
Uninstallation
This section describes uninstallation process and steps you might need to take before proceeding with the uninstallation itself.
Export results from UI
Audio files and technologies results are stored in the web browser used to work with virtual appliance. You might want to export the results from virtual appliance UI. Go through each technology used, where you should find an export button. By selecting an export format, your results will be automatically downloaded to your machine.
This procedure should be run on all browsers/computers used for interaction with Virtual Appliance UI.
Delete data from UI
To delete the data from the UI, go to the Settings page. There, you should see a red button (Clear your data and settings) that opens a dialog window summarizing the uploaded data. All uploaded data and selected user preferences, such as language settings, will be deleted after submission. The page will reload, and you should see a success notification in the bottom corner of the page. After that, close the browser tab or window.
This procedure must be run on all browsers/computers used for interaction with Virtual Appliance UI.
Uninstallation guide
Simply delete virtual machine from hypervisor. Then delete both system and data disks if they were not deleted automatically.