Version: 3.2.0

Speech Platform Virtual Appliance

The Speech Platform Virtual Appliance is a distribution of the Phonexia Speech Platform in the form of a virtual image. Presently, it exclusively supports the OVF format.

Installation

This section describes how to install virtual appliance into your virtualization platform.

Prerequisites

Currently we support only virtualbox and VMWare.

It will probably work on other virtualization platforms but we haven't tested it yet.

Minimal HW requirements

50GB of disk space
4 CPU cores
16GB of memory

Minimal requirements mean that you are able to process single technology (speaker identification, enhanced speech-to-text built on Whisper or speech-to-text by Phonexia) for evaluation purposes. We recommend to disable all non-needed (not evaluated) technologies to save the resources.

Resource usage per technology

Speaker identification - 1 CPU core and 2GB memory
Speech-to-text by phonexia - 1 CPU core and 4GB memory per language
Enhanced Speech-to-text built on whisper - 8 CPU cores and 8GB memory or 1 CPU core and 8GB memory and GPU card

Note: Running enhanced speech-to-text built on Whisper on CPU is slow. We recommend to use at least 8 CPU cores to run our built-in examples in reasonable time.

GPU

GPU is not required to make virtual appliance work but you will suffer serious performance degradation for enhanced speech-to-text built on Whisper functionality.

If you decide to use GPU, then make sure that

Server HW (especially BIOS) has support for IOMMU.
Host OS can pass GPU device to virtualization platform (== Host OS can be configured to NOT use the GPU device)
Virtualization platform can pass GPU device to guest OS.

Installation guide

Download virtual appliance
Import virtual appliance to your virtualization platform (For Hyper-V deployment, please refer to section 'How to modify OVF to Hyper-V compatible VM')
Run virtual appliance

Post-installation steps

Virtual appliance is configured to obtain IP address from DHCP server. If you are not using DHCP server for IP allocation or prefer to set up static IP, then you have to reconfigure the OS.

SSH server

SSH server is deployed and enabled in virtual appliance. Use following credentials:

login: root
password: InVoiceWeTrust

We recommend to change the root password and disable password authentication via SSH for root user in favor of key-based authentication.

Open ports

List of open ports:

SSH (22) - for convenient access to OS
HTTP (80) - Speech platform is accessible via HTTP protocol
HTTPS (443) - Speech platform is also accessible via HTTPS protocol
HTTPS (6443) - Kubernetes API
HTTPS (10250) - Metrics server

K3s check

K3s (kubernetes distribution) is started automatically by systemd when virtual appliance is started. You can verify whether k3s is running or not with this command:

systemctl status k3s

Kubernetes check

When k3s service is started, it takes some time until application (== kubernetes pods) is started. Usually it takes around 2 minutes. To check if application is up and running, execute following command:

kubectl -n speech-platform get pods

When all pods are running, output looks like:

[root@speech-platform ~]# kubectl -n speech-platform get pods
NAME                                                                       READY   STATUS             RESTARTS      AGE
speech-platform-docs-57dcd49f9f-q97w4                                      1/1     Running            0             2m10s
speech-platform-envoy-759c9b49d9-99vp7                                     1/1     Running            0             2m10s
speech-platform-frontend-7f4566dbc6-jhprh                                  1/1     Running            0             2m10s
speech-platform-assets-5697b4c86-8sh9k                                     1/1     Running            0             2m9s
media-conversion-7d8f884f9-zh75g                                           1/1     Running            0             2m9s
speech-platform-api-69bc7d4d5b-6kv7x                                       1/1     Running            0             2m9s
enhanced-speech-to-text-built-on-whisper-74548494c866mrz                   0/1     CrashLoopBackOff   4 (29s ago)   2m10s
voiceprint-extraction-68d646d449-9br8m                                     0/1     CrashLoopBackOff   4 (33s ago)   2m10s
voiceprint-comparison-76948b4947-xjw92                                     0/1     CrashLoopBackOff   4 (15s ago)   2m10s

Voiceprint-extraction, voiceprint-comparison, language-identification and enhanced-speech-to-text-built-on-whisper microservices (pods) are failing initially. This is expected and it is caused by a missing license. You can either add a license to the microservices or disable them if you don't plan to use them.

Optionally, you can check if all other system and auxiliary applications are running:

kubectl get pods -A

All pods should be running or completed, like this:

[root@speech-platform ~]# kubectl get pods -A
NAMESPACE         NAME                                                                       READY   STATUS             RESTARTS        AGE
kube-system       local-path-provisioner-8d98546c4-9pq8p                                     1/1     Running            0               6m44s
kube-system       coredns-94bcd45cb-rp6zx                                                    1/1     Running            0               6m44s
kube-system       metrics-server-754ff994c9-pczpx                                            1/1     Running            0               6m44s
kube-system       svclb-ingress-nginx-controller-baed713a-nzwcc                              2/2     Running            0               5m24s
kube-system       helm-install-ingress-nginx-wpwk4                                           0/1     Completed          0               6m45s
kube-system       helm-install-filebrowser-fd569                                             0/1     Completed          0               6m45s
kube-system       helm-install-nginx-28rll                                                   0/1     Completed          0               6m45s
kube-system       helm-install-speech-platform-7k6qf                                         0/1     Completed          0               6m45s
ingress-nginx     ingress-nginx-controller-679f97c77d-rdssr                                  1/1     Running            0               5m24s
nginx             nginx-6ddd78f789-f9lq2                                                     1/1     Running            0               5m39s
filebrowser       filebrowser-7476f7c65c-rk9d5                                               1/1     Running            0               5m39s
gpu               nfd-58s4x                                                                  2/2     Running            0               5m44s
speech-platform   speech-platform-docs-57dcd49f9f-q97w4                                      1/1     Running            0               5m38s
speech-platform   speech-platform-envoy-759c9b49d9-99vp7                                     1/1     Running            0               5m38s
speech-platform   speech-platform-frontend-7f4566dbc6-jhprh                                  1/1     Running            0               5m38s
speech-platform   speech-platform-assets-5697b4c86-8sh9k                                     1/1     Running            0               5m37s
speech-platform   media-conversion-7d8f884f9-zh75g                                           1/1     Running            0               5m37s
speech-platform   speech-platform-api-69bc7d4d5b-6kv7x                                       1/1     Running            0               5m37s
speech-platform   voiceprint-extraction-68d646d449-9br8m                                     0/1     CrashLoopBackOff   5 (2m33s ago)   5m38s
speech-platform   enhanced-speech-to-text-built-on-whisper-74548494c866mrz                   0/1     CrashLoopBackOff   5 (2m32s ago)   5m38s
speech-platform   voiceprint-comparison-76948b4947-xjw92                                     0/1     CrashLoopBackOff   5 (2m20s ago)   5m38s

Application check

Access virtual appliance welcome page on virtual appliance to see IP address or hostname from your local computer. If you are able to access the welcome page, applications should work.

Components

This is the list of components virtual appliance is composed of.

Operating system

There is Rocky Linux 9.3 under the hood.

GPU support

Virtual appliance has all necessary prerequisites pre-baked to allow run GPU-powered workloads (especially enhanced-speech-to-text-built-on-whisper). This means that NVIDIA drivers and container toolkit are already installed. Also GPU time-based sharing is enabled by default which means you can run multiple technologies on single GPU simultaneously.

Kubernetes

There is k3s kubernetes distribution deployed inside.

Ingress controller

We use ingress-nginx ingress controller. This component is serving as reverse proxy and loadbalancer.

Speech platform

This is the application for solving various voice-related problems like speaker identification, speech-to-text transcription and many more. Speech platform is accessible via web browser or API.

File Browser

File Browser is web-based file browser/editor used to work with data on data disk.

Prometheus

Prometheus is a tool for providing monitoring information about kubernetes components.

Grafana

Grafana is a tool for visualization of prometheus metrics.

Disks

Virtual appliance comes with system disk and data disk.

System disk

Operating system is installed on system disk. You should not modify system disk unless you know what you are doing.

List of component stored on system disk:

NVIDIA drivers
Container images for microservices
Packaged helm charts

Data disk

Data disk is used as persistent storage. Unlike system disk, data disk is intended to contain files which can be viewed/modified by the user. Data disk is created with PHXDATADISK label and system is instructed to mount filesystem with this label to /data directory.

List of components stored on data disk:

Logs (/data/logs) of the system, k3s and individual containers
Configuration for ingress controller (/data/ingress-nginx/ingress-nginx-values.yaml)
Configuration for speech platform (/data/speech-platform/speech-platform-values.yaml)
Models for individual microservices (/data/models/)
Custom images (/data/images/)
Prometheus persistent storage (/data/storage/prometheus)

Configuration

Following section describes various configuration use cases.

VirtualBox configuration

If you use VirtualBox to run Virtual Appliance on Linux distributions, you can use our installation script to import and configure Virtual Appliance. To obtain this script, contact Phonexia support. They will provide you with this script. To use it, you must have already downloaded a bundle with the files of the Virtual Appliance itself and a bundle with models and licenses. To run this script, the following steps need to be done:

Open terminal and locate the script.
Make the script an executable:
```
$ chmod +x SpeechPlatformInstaller.sh
```

Run the script using the following command:

$ SpeechPlatformInstaller.sh -m /path/to/models_bundle -v /path/to/VA_bundle -n virtual_machine_name

Wait until the script finishes. When it does, it displays a link to the Speech Platform application.

Upload microservices models and licenses

Virtual appliance is distributed without licenses and only with default models. To get other models and licenses, contact Phonexia support. They will provide a bundle (.zip file) with models and licenses. Bundle then need to be uploaded and unzipped inside the virtual appliance. To upload bundle with models and licenses, these steps need to be done:

Upload provided licensed-models.zip archive to virtual appliance via filebrowser or via scp:
```
$ scp -P <virtual-appliance-port> licensed-models.zip root@<virtual-appliance-ip>:/data/
```

Connect to the virtual appliance /data folder:

$ ssh root@<virtual-appliance-ip> -p <virtual-appliance-port>
$ cd /data

Unzip archive. Model is extracted to directory per language:
```
$ unzip licensed-models.zip
```
Check that the configuration is valid and successfully applied.

The bundle content has a specific structure that ensures all models and licenses are placed in the correct locations after unzipping.

Changing microservice models

In case you use models other than the default, you need to change the values of paths in /data/speech-platform/speech-platform-licenses.yaml file:

<microservice>.config.model.file value leading to model, and
<microservice>.config.license.key value leading to the license for used model.

Example (change model large_v2-1.0.1 to small-1.0.1 for enhanced-speech-to-text-built-on-whisper microservice):

enhanced-speech-to-text-built-on-whisper:
  config:
    model:
      volume:
        hostPath:
          path: /data/models/enhanced_speech_to_text_built_on_whisper
      file: "large_v2-1.0.1.model"
    license:
      useSecret: true
      secret: enhanced-speech-to-text-built-on-whisper-license
      key: "large_v2-1.0.1"

needs to be changed to:

enhanced-speech-to-text-built-on-whisper:
  config:
    model:
      volume:
        hostPath:
          path: /data/models/enhanced_speech_to_text_built_on_whisper
      file: "small-1.0.1.model"
    license:
      useSecret: true
      secret: enhanced-speech-to-text-built-on-whisper-license
      key: "small-1.0.1"

These changes are required for all microservices with licensed models except speech-to-text-phonexia and time-analysis.

Inspect microservices models

Models are stored inside the /data/models folder, where path to each model is constructed as:

/data/models/<technology_name>/<model_name>-<model_version>.model

Where:

technology_name - is name of the technology, e.g. speaker_identification
model_name - is name of the model, e.g. xl
model_version - is version of a model, e.g. 5.0.0

Imported models can be inspected after the uploading by following command:

Content of the /data/models:

$ find /data/models
/data/models/
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/xl-5.0.0-license.txt
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1.model
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt
/data/models/speech_to_text_phonexia
/data/models/speech_to_text_phonexia/en_us_6-3.61.0-license.txt
/data/models/speech_to_text_phonexia/en_us_6-3.61.0.model
/data/models/time_analysis
/data/models/time_analysis/generic-3.61.0-license.txt
/data/models/time_analysis/generic-3.61.0.model

Inspect microservices licenses

Licenses are stored in path /data/speech-platform/speech-platform-licenses.yaml. File contains the Kubernetes secrets definition of a licenses which ensures the simple loading of licenses to the application.

Imported licenses can be inspected after the uploading by following command:

Content of the /data/speech-platform folder:

$ find /data/speech-platform/
/data/speech-platform/
/data/speech-platform/speech-platform-licenses.yaml
/data/speech-platform/speech-platform-values.yaml

Kubernetes secret definitions in a file are separated by ---. Each secret contains the contents of the file on the .stringData.license path corresponding to the technology for which the license is meant. For example:

For model of technology speaker_identification with name xl and version 5.0.0, the secret will look like this:

---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
  license: |
    <content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque

Content of a license file (/data/speech-platform/speech-platform-licenses.yaml) can be shown by following command:

Content of the license file:

$ cat /data/speech-platform/speech-platform-licenses.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
   license: |
     <content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
name: enhanced-speech-to-text-built-on-whisper-license
namespace: speech-platform
stringData:
   license: |
     <content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt" file>
type: Opaque
.
.
.

Set DNS name for speech platform virtual appliance

Speech platform is accessible on http://<IP_address_of_virtual_appliance>. We recommend to create DNS record to make access more comfortable for users. Consult your DNS provider to get more information how to add corresponding DNS record.

Use HTTPS certificate

Speech platform is also accessible via HTTPS protocol on https://<IP_address_of_virtual_appliance>. If you prefer secure communication you might need to use your own TLS certificate for securing the communication.

To do so, follow this guide:

Prepare the TLS certificate beforehand.
Put certificate private key in file named cert.key.
Put certificate into file named cert.crt.

Create kubernetes secret manifest storing the certificate and private key:

kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml

Copy manifest (resulting file) to /data/ingress-nginx/certificate-server.yaml.
Open text file /data/ingress-nginx/ingress-nginx-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.controller.extraArgs.default-ssl-certificate
Uncomment the line.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
    <Not significant lines omitted>
      extraArgs:
      <Not significant lines omitted>
        default-ssl-certificate: "ingress-nginx/default-ssl-certificate"

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Extend disks

The following section describes how to extend disks.

Identify system and data disks

To list disks, the command lsblk that serves for listing block devices can be used:

# lsblk -pd
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
/dev/nvme0n1 259:0    0    60G  0 disk
/dev/nvme1n1 259:1    0    20G  0 disk

There are two disks listed. To check which one is the system disk partition of type Linux root needs to be found. The others are just regular disks used for storing the data (data disks). For this fdisk command can be used.

System disk:

# fdisk -l /dev/nvme0n1
...
Non-essential disk information
...
Device           Start       End   Sectors  Size Type
/dev/nvme0n1p1    2048      6143      4096    2M BIOS boot
/dev/nvme0n1p2    6144    210943    204800  100M EFI System
/dev/nvme0n1p3  210944   2258943   2048000 1000M Linux extended boot
/dev/nvme0n1p4 2258944 125829086 123570143 58.9G Linux root (x86-64)

Data disk:

# fdisk -l /dev/nvme1n1
...
Non-essential disk information
...
Device         Start      End  Sectors Size Type
/dev/nvme1n1p1  2048 41943006 41940959  20G Linux filesystem

It can be seen that the disk /dev/nvme0n1 is the system disk as it contains the /dev/nvme0n1p4 partition of the Linux root type. The other disks are used as data storage (data disk).

Extend disk prerequisites

Before extending the disk information needs to be gathered:

Disk, disk partition, and filesystem mount point (root - /, data - /data)
- Example:
  - disks: /dev/nvme0n1, /dev/nvme1n1
  - partitions: /dev/nvme0n1p4, /dev/nvme1n1p1
  - mount points: /, /data

# lsblk -p
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
/dev/nvme0n1                 259:0    0    60G  0 disk
├─/dev/nvme0n1p1             259:2    0     2M  0 part
├─/dev/nvme0n1p2             259:3    0   100M  0 part /boot/efi
├─/dev/nvme0n1p3             259:4    0  1000M  0 part /boot
└─/dev/nvme0n1p4             259:5    0  58.9G  0 part
  └─/dev/mapper/rocky-lvroot 253:0    0  58.9G  0 lvm  /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/metrics/0
                                                       /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/postgresql/2
                                                       /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/postgresql/1
                                                       /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/postgresql/0
                                                       /var/lib/kubelet/pods/f4f542a1-f876-4e64-b1e2-745610742002/volume-subpaths/capabilities/frontend/0
                                                       /var/lib/kubelet/pods/bcda3218-ed84-4bf9-a59d-1cea0798f65b/volume-subpaths/sc-dashboard-provider/grafana/4
                                                       /var/lib/kubelet/pods/bcda3218-ed84-4bf9-a59d-1cea0798f65b/volume-subpaths/config/grafana/2
                                                       /var/lib/kubelet/pods/bcda3218-ed84-4bf9-a59d-1cea0798f65b/volume-subpaths/config/grafana/0
                                                       /var/lib/kubelet/pods/231c54e8-63c0-4e8c-949e-58900d8e0f1e/volume-subpaths/config/filebrowser/0
                                                       /
/dev/nvme1n1                 259:1    0    20G  0 disk
└─/dev/nvme1n1p1             259:7    0    20G  0 part /var/lib/kubelet/pods/9a54614c-f3e1-44b1-b190-5a5316c2bfc4/volume-subpaths/prometheus/prometheus/2
                                                       /var/log
                                                       /var/lib/rancher/k3s/server/manifests/speech-platform/values
                                                       /var/lib/rancher/k3s/server/manifests/ingress-nginx/values
                                                       /var/lib/rancher/k3s/agent/images/extra
                                                       /data

Filesystem and filesystem type. This information can be gathered with df command.
- Example:
  - filesystems: /dev/mapper/rocky-lvroot, /dev/nvme1n1p1
  - filesystem types: xfs

# df -T / /data
Filesystem               Type 1K-blocks     Used Available Use% Mounted on
/dev/mapper/rocky-lvroot xfs   61714432 33111312  28603120  54% /
/dev/nvme1n1p1           xfs   20904940  3362328  17542612  17% /data

Check if the disk has the needed size to extend the partition
- The size of disk /dev/nvme0n1 is 70G while all its partitions together have size of 60G
- There is 10G left for extension of Linux root (mount point /)

# lsblk -p /dev/nvme0n1
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
/dev/nvme0n1                 259:0    0   70G  0 disk
├─/dev/nvme0n1p1             259:2    0    2M  0 part
├─/dev/nvme0n1p2             259:3    0  100M  0 part /boot/efi
├─/dev/nvme0n1p3             259:4    0 1000M  0 part /boot
└─/dev/nvme0n1p4             259:5    0 58.9G  0 part
  └─/dev/mapper/rocky-lvroot 253:0    0 58.9G  0 lvm  /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/metrics/0
                                                      /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/postgresql/2
                                                      /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/postgresql/1
                                                      /var/lib/kubelet/pods/427d658b-7635-42ff-b52f-c93625133b48/volume-subpaths/empty-dir/postgresql/0
                                                      /var/lib/kubelet/pods/f4f542a1-f876-4e64-b1e2-745610742002/volume-subpaths/capabilities/frontend/0
                                                      /var/lib/kubelet/pods/bcda3218-ed84-4bf9-a59d-1cea0798f65b/volume-subpaths/sc-dashboard-provider/grafana/4
                                                      /var/lib/kubelet/pods/bcda3218-ed84-4bf9-a59d-1cea0798f65b/volume-subpaths/config/grafana/2
                                                      /var/lib/kubelet/pods/bcda3218-ed84-4bf9-a59d-1cea0798f65b/volume-subpaths/config/grafana/0
                                                      /var/lib/kubelet/pods/231c54e8-63c0-4e8c-949e-58900d8e0f1e/volume-subpaths/config/filebrowser/0
                                                      /

Extend system disk (mount point `/` filesystem)

To extend disk several follow these steps. In the following example, the system disk will be extended:

Linux root (mounted point /) is the 4th partition when fdisk -l /dev/nvme0n1 command is used. To extend this partition use command:

echo ", +" | sfdisk --force -N 4 /dev/nvme0n1

Inform kernel about device table change (extended partition):

partprobe

Check physical volume name:

pvdisplay | grep "PV Name"

Resize physical volume:

pvresize /dev/nvme0n1p4

Check logical volume path and extend it:

lvdisplay | grep "LV Path"

Extend logical volume:

lvextend -r -l +100%FREE /dev/rocky/lvroot

Check filesystem has resized

df -Th /

Extend data disk

To extend the data disk (/data mount path filesystem) several follow these steps. In the following example data disk will be extended:

Data disk (mounted path /data) is the 1st partition when fdisk -l /dev/nvme1n1 command is used. To extend this partition use command:

echo ", +" | sfdisk --force -N 1 /dev/nvme1n1

Inform kernel about device table change (extended partition):

partprobe

Extend logical volume:

xfs_growfs /dev/nvme1n1p1

Check filesystem has resized

df -Th /data

Disable unneeded microservices

Virtual appliance comes with all microservices enabled by default. You may decide to disable microservice if you do not plan to use it. Disabled microservice does not consume any compute resources.

Find out which microservices you want to disable - voiceprint-extraction, voiceprint-comparison, language-identification, enhanced-speech-to-text-built-on-whisper or speech-to-text-phonexia.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.<microservice>.enabled
Change the value from true to false.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
      enabled: false

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
Check that the configuration is valid and successfully applied.

Phonexia speech to text microservice

This section describes configuration specific to phonexia speech to text microservice.

Permanent vs onDemand instances

Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.

All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
            .
            .
            .
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6
            onDemand:
              enabled: true
            .
            .
            .

Delete onDemand key and its subkeys.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
            .
            .
            .
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6
            .
            .
            .

Save the file. 1.The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Configure languages in speech-to-text-phonexia microservice

This microservice consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.

Note: Docker images for any language are not included in the virtual appliance. This means that virtual appliance needs to access the internet to download the docker image when speech-to-text-phonexia microservice is used! As a workaround you can put custom image into virtual appliance.

By default all languages/instances are enabled. List of languages:

ar_kw_6
ar_xl_6
bn_6
cs_cz_6
de_de_6
en_us_6
es_6
fa_6
fr_fr_6
hr_hr_6
hu_hu_6
it_it_6
ka_ge_6
kk_kz_6
nl_6
pl_pl_6
ps_6
ru_ru_6
sk_sk_6
sr_rs_6
sv_se_6
tr_tr_6
uk_ua_6
vi_vn_6
zh_cn_6

How to disable all language instances except of cs_cz_6 and en_us_6:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: ark
            imageTag: 3.59.0-stt-ar_kw_6
            onDemand:
              enabled: true
          - name: arx
            imageTag: 3.59.0-stt-ar_xl_6
            onDemand:
              enabled: true
          - name: bn
            imageTag: 3.59.0-stt-bn_6
            onDemand:
              enabled: true
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6
            onDemand:
              enabled: true
          - name: de
            imageTag: 3.59.0-stt-de_de_6
            onDemand:
              enabled: true
          - name: en
            imageTag: 3.59.0-stt-en_us_6
            onDemand:
              enabled: true
            .
            .
            .
          - name: vi
            imageTag: 3.59.0-stt-vi_vn_6
            onDemand:
              enabled: true
          - name: zh
            imageTag: 3.59.0-stt-zh_cn_6
            onDemand:
              enabled: true

Comment out all the instances except (cs_cz_6 and en_us_6).

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          #- name: ark
          #  imageTag: 3.59.0-stt-ar_kw_6
          #  onDemand:
          #    enabled: true
          #- name: arx
          #  imageTag: 3.59.0-stt-ar_xl_6
          #  onDemand:
          #    enabled: true
          #- name: bn
          #  imageTag: 3.59.0-stt-bn_6
          #  onDemand:
          #    enabled: true
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6
            onDemand:
              enabled: true
          #- name: de
          #  imageTag: 3.59.0-stt-de_de_6
          #  onDemand:
          #    enabled: true
          - name: en
            imageTag: 3.59.0-stt-en_us_6
            onDemand:
              enabled: true
            .
            .
            .
          #- name: vi
          #  imageTag: 3.59.0-stt-vi_vn_6
          #  onDemand:
          #    enabled: true
          #- name: zh
          #  imageTag: 3.59.0-stt-zh_cn_6
          #  onDemand:
          #    enabled: true

Or you can even delete the instances you are not interested in.

Then updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6
            onDemand:
              enabled: true
          - name: en
            imageTag: 3.59.0-stt-en_us_6
            onDemand:
              enabled: true

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Modify replicas for permanent language instances

Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.

Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 16GB memory)! Note: OnDemand instance has always only one replica.

Find out which language instance you want to configure replicas for.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount.
Change the value to desired amount of replicas.
Updated file should look like:

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6
            replicaCount: 2

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Modify parallelism for instances

Each instance is able to process only one request at the time, unless the parallelism is overridden. Value of parallelism means the maximum number of requests processed by one instance. Parallelism is set globally for all instances of technology, however each instance can override the value. To override parallelism for speech-to-text-phonexia or time-analysis these steps needs to be followed:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Find the key, depending on technology (speech-to-text-phonexia, time-analysis) for which parallelism should be overridden: .spec.valuesContent.<technology>.parallelism
Change the value to desired number of requests processed in parallel

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      # Global value of parallelism for all instances
      parallelism: 2
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.61.0
          - name: en
            imageTag: 3.61.0
            # Override of parallelism for en instance
            parallelism: 4

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Add custom images

This section describes how to add custom images to a virtual appliance. Typical use case is to add speech to text images to Speech Engine for languages you want to use or to add a GPU-powered image for Voiceprint Extraction.

Add language images for speech to text phonexia

This subsection focuses on adding Phonexia Speech to Text images to the Speech Engine for the languages you want to use. These images need to be added to the data disk in order for Phonexia Speech to Text to work offline. In the example we will add two images: Phonexia Speech to Text for English and Czech languages.

[Virtual appliance] Open the text file /data/speech-platform/speech-platform-values.yaml either directly from the within the virtual appliance or via a file browser.
[Virtual appliance] Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.

[Virtual appliance] Choose which images you want to add. Use imageTag key to find out which image tag(s) to use:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.59.0-stt-cs_cz_6 <- This is the image tag
            onDemand:
              enabled: true
          - name: en
            imageTag: 3.59.0-stt-en_us_6 <- This is the image tag
            onDemand:
              enabled: true

[PC] Pull all images:

docker pull phonexia/spe:3.60.1-stt-en_us_6
docker pull phonexia/spe:3.60.1-stt-cs_cz_6

[PC] Save all images to single tar archive:

docker save -o images.tar phonexia/spe:3.60.1-stt-cs_cz_6 phonexia/spe:3.60.1-stt-en_us_6

[PC] Copy images.tar file into virtual appliance via ssh or filebrowser to /data/images.
```
scp images.tar root@<IP of virtual appliance>:/data/images
```
[Virtual appliance] Restart virtual appliance to load the images or load them manually with:
```
ctr image import /data/images/images.tar
```

Add gpu-powered image for voiceprint-extraction

This section describes how to add and use a GPU-powered image for Voiceprint Extraction.

Identify which Voiceprint Extraction image from dockerhub you want to use. If you are not sure, then use the latest gpu image tag. In this example, we will use the 1.2.0-gpu image tag.
[Virtual appliance] Open the text file /data/speech-platform/speech-platform-values.yaml either directly from within the virtual appliance or via a file browser.
[Virtual appliance] Locate the key .spec.valuesContent.voiceprint-extraction.image.

[Virtual appliance] Configure Voiceprint Extraction to use the image:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      image:
        repository: phonexia/voiceprint-extraction
        tag: 1.2.0-gpu
        registry: docker.io

If you don't mind downloading the image from the internet (dockerhub) you are good to go. Otherwise, you need to upload the image to the virtual appliance.

[PC] Pull the Voiceprint Extraction image:

docker pull phonexia/voiceprint-extraction:1.2.0-gpu

[PC] Save all images to a single tar archive:

docker save -o images.tar phonexia/voiceprint-extraction:1.2.0-gpu

[PC] Copy the images.tar file into the virtual appliance via SSH or file browser to /data/images.
```
scp images.tar root@<IP of virtual appliance>:/data/images
```
[Virtual appliance] Restart the virtual appliance to load the images or load them manually with:
```
ctr image import /data/images/images.tar
```

Add gpu-powered image for language-identification

This section describes how to add and use a GPU-powered image for Language Identification.

Identify which Language Identification from dockerhub you want to use. If you are not sure, then use the latest gpu image tag. In this example, we will use the 1.2.0-gpu image tag.
[Virtual appliance] Open the text file /data/speech-platform/speech-platform-values.yaml either directly from within the virtual appliance or via a file browser.
[Virtual appliance] Locate the key .spec.valuesContent.language-identification.image.

[Virtual appliance] Configure Language Identification use the image:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    language-identification:
    <Not significant lines omitted>
      image:
        repository: phonexia/language-identification
        tag: 1.2.0-gpu
        registry: docker.io

If you don't mind downloading the image from the internet (dockerhub) you are good to go. Otherwise, you need to upload the image to the virtual appliance.

[PC] Pull the Language Identification image:

docker pull phonexia/language-identification:1.2.0-gpu

[PC] Save all images to a single tar archive:

docker save -o images.tar phonexia/language-identification:1.2.0-gpu

[PC] Copy the images.tar file into the virtual appliance via SSH or file browser to /data/images.
```
scp images.tar root@<IP of virtual appliance>:/data/images
```
[Virtual appliance] Restart the virtual appliance to load the images or load them manually with:
```
ctr image import /data/images/images.tar
```

Modify microservice replicas

Each microservice has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding microservices.

Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 16GB memory)!

Find out which microservices you want to modify replicas - voiceprint-extraction, voiceprint-comparison, language-identification and enhanced-speech-to-text-built-on-whisper-enhanced.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.<microservice>.replicaCount
Change the value to desired amount of replicas.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
      replicaCount: 2

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Run enhanced-speech-to-text-built-on-whisper microservice on GPU

At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi to list all the devices. If device is present and visible to the system, then output should look like:

[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, then you can reconfigure the enhanced-speech-to-text-built-on-whisper to use GPU for the processing.

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate enhanced-speech-to-text-built-on-whisper section .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.
Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.device.

Uncomment the line so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
       # Uncomment this to force whisper to run on GPU
       device: cuda

Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.resources.

Request GPU resources for the processing so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"

Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.runtimeClassName.

Set runtimeClassName so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"

Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.updateStrategy.

Set type to Recreate to allow seamless updates so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      # Uncomment this to allow seamless updates on single GPU machine
      updateStrategy:
        type: Recreate

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        device: cuda

      <Not significant lines omitted>
      resources:
        limits:
          nvidia.com/gpu: "1"

      <Not significant lines omitted>
      runtimeClassName: "nvidia"

      <Not significant lines omitted>
      updateStrategy:
        type: Recreate

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Run Voiceprint Extraction microservice on GPU

Note: GPU-powered image for Voiceprint Extraction is not included in the virtual appliance. Follow this guide to add and use the image.

First, make sure the virtual appliance can detect the GPU device(s). Use nvidia-smi to list all the devices. If the device is present and visible to the system, then the output should look like:

[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, then you can reconfigure the Voiceprint Extraction to use GPU for the processing.

Open the text file /data/speech-platform/speech-platform-values.yaml either directly from within the virtual appliance or via a file browser.
Locate the Voiceprint Extraction section .spec.valuesContent.voiceprint-extraction.
Locate the key .spec.valuesContent.voiceprint-extraction.config.device.

Uncomment the line so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
       # Uncomment this to force voiceprint-extraction to run on GPU
       device: cuda

Locate the key .spec.valuesContent.voiceprint-extraction.resources.

Request GPU resources for the processing so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"

Locate key the .spec.valuesContent.voiceprint-extraction.runtimeClassName.

Set runtimeClassName so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      # Uncomment this to run voiceprint-extraction on GPU
      runtimeClassName: "nvidia"

Locate the key .spec.valuesContent.voiceprint-extraction.updateStrategy.

Set type to Recreate to allow seamless updates so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      # Uncomment this to allow seamless updates on single GPU machine
      updateStrategy:
        type: Recreate

The updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        device: cuda

      <Not significant lines omitted>
      resources:
        limits:
          nvidia.com/gpu: "1"

      <Not significant lines omitted>
      runtimeClassName: "nvidia"

      <Not significant lines omitted>
      updateStrategy:
        type: Recreate

Save the file
The application automatically recognizes that the file was updated and redeploys itself with the updated configuration.

Run Language Identification on GPU

Note: GPU-powered image for Language Identification is not included in the virtual appliance. Follow this guide to add and use the image.

First, make sure the virtual appliance can detect the GPU device(s). Use nvidia-smi to list all the devices. If the device is present and visible to the system, then the output should look like:

[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, then you can reconfigure the Language Identification use GPU for the processing.

Open the text file /data/speech-platform/speech-platform-values.yaml either directly from within the virtual appliance or via a file browser.
Locate the Language Identification section .spec.valuesContent.language-identification.
Locate the key .spec.valuesContent.language-identification.config.device.

Uncomment the line so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    language-identification:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
       # Uncomment this to force language-identification to run on GPU
       device: cuda

Locate the key .spec.valuesContent.language-identification.resources.

Request GPU resources for the processing so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    language-identification:
    <Not significant lines omitted>
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"

Locate key the .spec.valuesContent.language-identification.runtimeClassName.

Set runtimeClassName so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    language-identification:
    <Not significant lines omitted>
      # Uncomment this to run language-identification on GPU
      runtimeClassName: "nvidia"

Locate the key .spec.valuesContent.language-identification.updateStrategy.

Set type to Recreate to allow seamless updates so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    language-identification:
    <Not significant lines omitted>
      # Uncomment this to allow seamless updates on single GPU machine
      updateStrategy:
        type: Recreate

The updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    language-identification:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        device: cuda

      <Not significant lines omitted>
      resources:
        limits:
          nvidia.com/gpu: "1"

      <Not significant lines omitted>
      runtimeClassName: "nvidia"

      <Not significant lines omitted>
      updateStrategy:
        type: Recreate

Save the file
The application automatically recognizes that the file was updated and redeploys itself with the updated configuration.

Change model used in a microservice

Each microservice needs a model to do its job properly. We provide more models for some microservices, for example enhanced-speech-to-text-built-on-whisper. Usually we pre-configure microservices with the most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.

License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.

Change model in enhanced-speech-to-text-built-on-whisper microservice

We offer following models for enhanced-speech-to-text-built-on-whisper microservice:

large-v3 - next-gen most accurate multilingual model.
large-v2 - most accurate multilingual model. This is the default model.
medium - less accurate but faster than large-v2.
base - less accurate but faster than medium.
small - less accurate but faster than base.

Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.

Upload archive to virtual appliance.

$ scp licensed-models.zip root@<virtual-appliance-ip>:/data/

Unzip archive. Models are extracted to directory per microservice:
```
$ unzip licensed-models.zip
```

Content of the /data/models should look like:

$ find /data/models
/data/models/
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/small-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.0.0-license.key.txt
/data/models/enhanced_speech_to_text_built_on_whisper/base-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.0.0-license.key.txt
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model
Change content of the file key from "large_v2-1.0.0.model" to file you've just uploaded ("small-1.0.0.model").

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      config:
        model:
          <Not significant lines omitted>
          file: "small-1.0.0.model"

Change the license because you have changed the model. See above how to do it.
Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Load Speech to Text Phonexia and Time Analysis model from data disk

To keep up with the latest version of application, load models from virtual appliance volume is possible. For using the image without the model and load existing models from data volume, instance in config file need to be setup as follows:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          . . .
          - name: en
            imageTag: 3.61.0
          . . .
  <Not significant lines omitted>
    time-analysis:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          . . .
          - name: tae
            imageTag: 3.61.0
          . . .

As a default we count with that the model will be located on path /data/models/speech_to_text_phonexia/en_us_6-3.61.0.model. This folder structure is ensured by unzipping provided licensed-models.zip archive in /models/ path. Additionally if the path to the model is different, or the version of model is not matching with the image, it can be specified in instances config as a:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          . . .
          - name: cs
            imageTag: 3.61.0
            model:
              hostPath: /data/models/speech_to_text_phonexia/en_us_6-3.61.0.model
          . . .
  <Not significant lines omitted>
    time-analysis:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          . . .
          - name: tae
            imageTag: 3.61.0
            model:
              hostPath: /data/models/time_analysis/generic-3.61.0.model
          . . .

So far model loading from data disk is supported only by the Speech to Text Phonexia and Time Analysis technologies.

Process patented audio codecs with media-conversion

By default media conversion can work only with patent-free audio codecs.

We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.

Pull Media Conversion image on the fly

This is handy if you don't mind pulling images from the internet. Image is pulled only if it is not present yet.

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.media-conversion.image

Change content of the repository, registry, tag and tagSuffix to

media-conversion:
  image:
    registry: docker.io
    repository: phonexia/media-conversion
    tag: 1.0.0
    tagSuffix: ""

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    media-conversion:
    <Not significant lines omitted>
      image:
        registry: docker.io
        repository: phonexia/media-conversion
        tag: 1.0.0
        tagSuffix: ""

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied.

Put Media Conversion image into virtual appliance

This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.

[PC] Pull media-conversion image locally:

$ docker pull phonexia/media-conversion:1.0.0

[PC] Save Media Conversion image to tar archive:

$ docker save --output images.tar phonexia/media-conversion:1.0.0

[PC] Copy images.tar file into virtual appliance via ssh or filebrowser to /data/images.
```
scp images.tar root@<IP of virtual appliance>:/data/images
```
[Virtual appliance] Restart virtual appliance to load the images or load them manually with:
```
ctr image import /data/images/images.tar
```
Reconfigure the speech-platform to use locally downloaded image as mentioned above.

Disable DNS resolving for specific domains

The Kubernetes resolver tries to resolve non-FQDN names with all domains from /etc/resolv.conf. This might cause issues if access to the upstream DNS server (taken from /etc/resolv.conf as well) is denied. To avoid this issue, configure the Kubernetes resolver to skip lookup for specific domain(s).

[Virtual appliance] Create file /data/speech-platform/coredns-custom.yaml manually with following content. Replace <domain1.com> and <domain2.com> for the domain you want to disable lookup for:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  custom.server: |
    <domain1.com>:53 {
      log
    }
    <domain2.com>:53 {
      log
    }

[Virtual appliance] The file should look like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  custom.server: |
    locadomain:53 {
      log
    }
    example.com:53 {
      log
    }

[Virtual appliance] Restart coreDNS to apply the change:
```
kubectl -n kube-system rollout restart deploy/coredns
```
[Virtual appliance] Check that coreDNS pod is running:
```
kubectl -n kube-system get pods -l k8s-app=kube-dns
```

Limits

This section describe what are virtual appliance limits and how to modify them.

API limits

Following limits are applied for the API itself.

Name	Unit	Default	Description
`taskExpirationTime`	seconds	`300`	Time when finished tasks are expired. API holds the information about finished tasks (both successfully finished and failed). These information are discarded after `taskExpirationTime`. Client usually polls on the task id. Client must retrieve the task status before it is expired. Maximum value is `3600`.
`taskGrpcTimeout`	seconds	`120`	Maximum time API waits for any task to complete. If you process big audio files, you probably need to increase this limit.
`inputStorageSize`	variable	`1GiB`	Size of the input storage. When audio file is POSTed to the API, whole file must be stored on the disk. If you process big files or multiple files in parallel, then this limit must be probably increased.
`internalStorageSize`	variable	`1GiB`	Size of the internal storage. Each audiofile is converted into wav format before processing. Converted audio is stored on the disk. If you process big files or multiple files in parallel, then this limit must be probably increased. Also note the `internalStorageSize` must be greater or equal to the `inputStorageSize`.
`singleFileUploadTimeout`	second	`120`	Maximum allowed time for uploading single file to the API. If you process big files or having a poor network connection, then this limit must be increased.
`singleFileUploadSize`	bytes	`104857600` (== 100MB)	Maximum allowed size of an audio file to upload. If you process big files then this limit must be increased. Note that this API/ingress limit not the UI limit!

How to change the API limits

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.api.config
Change the value of the corresponding limit to a new value.
```
api:
  config:
    taskExpirationTime: 1200
```

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    api:
    <Not significant lines omitted>
      config:
        taskExpirationTime: 1200

Save the file.
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Check that the configuration is valid and successfully applied.

UI limits

Following limits are applied for the UI itself.

Name	Unit	Default	Description
`taskParallelism`		`4`	UI post task to the API and polls for the task until it is finished. This controls how many tasks can be processed in parallel.
`taskPollingInterval`	seconds	`1`	Duration between poll attempts.
`taskPollingTimeout`	seconds	`3600`	How long the UI polls for the task. How long is the UI willing to wait until the task is finished.

Speaker Identification UI limits

Limits in config section .spec.valuesContent.frontend.config.limits.speakerIdentification are applicable only for speaker identification.

Name	Unit	Default	Description
`maxFileSize`	bytes	`5000000` (== 5MB)	Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the `singleFileUploadSize` API limit.
`maxFilesCount`		`100`	Maximum number of files to be uploaded.
`maxVoiceRecorderDuration`	seconds	`300`	Maximum duration of the record captured by voice recorder.

Speech to text UI limits

Limits in config section .spec.valuesContent.frontend.config.limits.speechToText are applicable only for Speech to Text. Limits are applicable for both Enhanced Speech to Text built on Whisper and Speech to Text by Phonexia 6th Gen.

Name	Unit	Default	Description
`maxFileSize`	bytes	`5000000` (== 5MB)	Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the `singleFileUploadSize` API limit.
`maxFilesCount`		`100`	Maximum number of files to be uploaded.
`maxVoiceRecorderDuration`	seconds	`300`	Maximum duration of the record captured by voice recorder.

Language Identification UI limits

Limits in config section .spec.valuesContent.frontend.config.limits.languageIdentification are applicable only for language identification.

Name	Unit	Default	Description
`maxFileSize`	bytes	`5000000` (== 5MB)	Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the `singleFileUploadSize` API limit.
`maxFilesCount`		`100`	Maximum number of files to be uploaded.
`maxVoiceRecorderDuration`	seconds	`300`	Maximum duration of the record captured by voice recorder.

How to change the UI limits

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.frontend.config.limits

Change the value of the corresponding limit to a new value.

frontend:
  config:
    limits:
      taskParallelism: 2

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    frontend:
    <Not significant lines omitted>
      config:
        limits:
          taskParallelism: 2

Save the file.
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Check that the configuration is valid and successfully applied.

Pod count limits

Currently, the platform is limited by the number of pods that can be created inside the Kubernetes cluster. The maximum number of pods is set to 300.

How to change the pod count limits

Pod count limits can be overridden by editing the /etc/k3s/rancher/config.yaml file. To override the maximal number of pods max-pods parameter needs to be added/edited. Example:

debug: true
system-default-registry: airgapped.phonexia.com
disable:
  - traefik
  - cloud-controller
kubelet-arg:
  - "kube-reserved=cpu=500m,memory=1Gi,ephemeral-storage=2Gi"
  - "system-reserved=cpu=500m, memory=1Gi,ephemeral-storage=2Gi"
  - "eviction-hard=memory.available<500Mi,nodefs.available<10%"
  - "max-pods=350"

After editing configuration, virtual machine needs to be restarted (stop and start) to apply changes.

Admin backends limits

Following limits are applied to admin backends (filebrowser, grafana, prometheus).

Name	Unit	Default	Description
`singleFileUploadTimeout`	seconds	`120`	Maximum allowed time for uploading.
`singleFileUploadSize`	bytes	`5368709120` (== 5GB)	Maximum allowed size of an audio file to upload.

How to change admin backends limits

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.ingressAdmin
Change the value of the corresponding limit to a new value.
```
ingressAdmin:
  singleFileUploadTimeout: 300
```

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    ingressAdmin:
    <Not significant lines omitted>
      singleFileUploadTimeout: 300

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Check that the configuration is valid and successfully applied.

This limits number of pods which can share single GPU.

Name	Unit	Default	Description
`replicas`	count	`3`	Number of pods sharing single GPU

Open text file /data/speech-platform/nvidia-device-plugin-configs.yaml either directly from inside virtual appliance or via file browser.
Locate key .data.default.sharing.timeSlicing.resources.replicas
Change the value of replicas key to a new value.
```
replicas: 6
```

Updated file should look like:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin-configs
  namespace: nvidia-device-plugin
data:
  default: |-
  <Not significant lines omitted>
        resources:
          - name: nvidia.com/gpu
            replicas: 6

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Admin console

Admin console is a simple web page containing links to various admin-related tools. Console is located at http://<IP_of_virtual_appliance>/admin. It contains links to

filebrowser
prometheus
grafana

Grafana

Grafana is tool for visualizing application and kubernetes metrics. List of most useful dashboards available in the grafana:

Envoy Clusters - See envoy cluster statistics
Kubernetes / Compute Resources / Pod - See resource consumption of individual pods
NGINX Ingress controller - See ingress controller stats
NVIDIA DCGM Exporter Dashboard - See GPU device stats
Node Exporter / Nodes - See stats about virtual appliance
Speech Platform API capacity - See metrics about speech platform itself

Troubleshooting

This section contains information about individual components of the speech platform and request flow

Speech platform components

List of the components:

frontend - simple webserver serving static html, css, javascript and image files
docs - simple webserver serving documentation
assets - simple webserver hosting examples
api - python component providing REST API interface
envoy - router and loadbalancer for GRPC messages
media-conversion - python component used for ** converting audio files from various formats to simple wav format ** splitting multi-channel audio into multiple single-channel files
technology microservices ** enhanced-speech-to-text-built-on-whisper - transcribes speech to text ** speech-to-text-phonexia - transcribes speech to text ** voiceprint-extraction - extracts voiceprint from audio file ** voiceprint-comparison - compares multiple voiceprints ** language-identification - identify language in audio

Request flow

User POST request (for example transcribe speech to text) to API.
API creates task for processing and output task id to the user.
From this point user can poll on the task to get the result.
API calls media-conversion via envoy.
Media conversion converts the audiofile to wav format and possibly splits it into multiple mono-channel files.
API gets converted audiofile from media-conversion.
API calls enhanced-speech-to-text-built-on-whisper via envoy.
Enhanced-speech-to-text-built-on-whisper transcribes the audiofile.
API gets the transcription.
User can retrieve the task result.

Check node status

Check node status with:

[root@speech-platform ~]# kubectl get nodes
NAME                          STATUS   ROLES                  AGE   VERSION
speech-platform.localdomain   Ready    control-plane,master   9s    v1.27.6+k3s1

If node is not in ready state, there is usually something wrong.

Note: Node list can be empty (No resources found) or node can be in notReady state if virtual appliance is starting up. This is normal and should be fixed in a few moments.

Also node has to have enough free disk and memory capacity. When this is not true, pressure events are emitted. Run following command to see the node conditions:

[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 07:46:39 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 08:06:45 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 07:46:39 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 07:46:39 +0000   KubeletReady                 kubelet is posting ready status

Disk pressure

Disk pressure node event is emitted, when kubernetes is running out of disk capacity in the /var filesystem. Node conditions looks like this:

[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 07:46:39 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 08:06:45 +0000   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 07:46:39 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 29 Apr 2024 08:13:54 +0000   Mon, 29 Apr 2024 07:46:39 +0000   KubeletReady                 kubelet is posting ready status

Follow the procedure for extending the disks.

Memory pressure

Memory pressure node event is emitted, when kubernetes is running out of free memory. Node conditions looks like this:

[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                         Message
  ----             ------  -----------------                 ------------------                ------                         -------
  MemoryPressure   True    Mon, 29 Apr 2024 08:50:50 +0000   Mon, 29 Apr 2024 08:50:50 +0000   KubeletHasInsufficientMemory   kubelet has insufficient memory available
  DiskPressure     False   Mon, 29 Apr 2024 08:50:50 +0000   Mon, 29 Apr 2024 08:33:08 +0000   KubeletHasNoDiskPressure       kubelet has no disk pressure
  PIDPressure      False   Mon, 29 Apr 2024 08:50:50 +0000   Mon, 29 Apr 2024 08:33:08 +0000   KubeletHasSufficientPID        kubelet has sufficient PID available
  Ready            True    Mon, 29 Apr 2024 08:50:50 +0000   Mon, 29 Apr 2024 08:33:08 +0000   KubeletReady                   kubelet is posting ready status

You need to grant more memory to the virtual appliance or disable unneeded microservices.

View pod logs

Logs are stored in /data/log/pods/ or in /data/logs/containers. You can view them via filebrowser if needed.

Alternatively you can display logs with kubectl command:

[root@speech-platform ~]# kubectl -n speech-platform logs -f voiceprint-extraction-7867578b97-w7bzd
[2024-04-29 08:59:10.250] [Configuration] [info] model: /models/xl-5.0.0.model
[2024-04-29 08:59:10.250] [Configuration] [info] port: 8080
[2024-04-29 08:59:10.250] [Configuration] [info] device: cpu
[2024-04-29 08:59:10.250] [critical] base64_decode: invalid character ''<''

Changes in configuration are not applied

Changes in the main configuration file /data/speech-platform/speech-platform-values.yaml are automatically picked up and applied by the helm controller. If configuration is not valid (or to be more precise - if the configuration file is not valid YAML file), the helm controller fails to apply the configuration. The helm controller creates a one-time job to update the helm chart with the new configuration. If the configuration is incorrect, the job will not complete successfully, and the underlying pod will either restart or be in an error state. The pod status will reflect this issue:

[root@speech-platform disks]# kubectl get pods -n kube-system | grep -i helm-install
helm-install-filebrowser-2b7pn                  0/1     Completed   0             51m
helm-install-ingress-nginx-m87d4                0/1     Completed   0             51m
helm-install-nginx-nrcvk                        0/1     Completed   0             51m
helm-install-dcgm-exporter-fjqzz                0/1     Completed   0             51m
helm-install-kube-prometheus-stack-jn5bz        0/1     Completed   0             51m
helm-install-keda-vsn95                         0/1     Completed   0             51m
helm-install-speech-platform-9l9vj              0/1     Error       4 (46s ago)   6m15s

View logs of failed helm-install pod:

[root@speech-platform disks]# kubectl logs -f helm-install-speech-platform-9l9vj -n kube-system
...
...
...
Upgrading speech-platform
+ helm_v3 upgrade --namespace speech-platform speech-platform https://10.43.0.1:443/static/phonexia-charts/speech-platform-0.0.0-36638f5-helm.tgz --values /config/values-10_HelmChartConfig.yaml
Error: failed to parse /config/values-10_HelmChartConfig.yaml: error converting YAML to JSON: yaml: line 494: could not find expected ':'

Check configuration file validity

This section describes how to check if your configuration is valid and how to identify which line in the configuration is incorrect.

Use following command to check if the configuration file is valid:

yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .

If the configuration file is valid, the content of the file will be printed. Otherwise, the line number with an error will be printed out as follows:

[root@speech-platform ~]# yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
Error: bad file '-': yaml: line 253: could not find expected ':'

Content of the file 10 lines before and 10 lines after line 253:

[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml  | grep 253 -B 10 -A 10
        # -- List of devices to use. GPU only.
        # deviceIndices: [0,1]
   245
        # Uncomment this to force whisper to run on GPU
        device: cuda
   248
        logLevel: debug
   250
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper
   255
          # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
          file: "large_v2-1.0.1.model"
        license:
          value:
          "eyJ2ZX...=="
   261
      # Uncomment this to grant access to GPU on whisper pod
      resources:

There is nothing suspicious on the line 253. In fact, the line number reported by yq might be slightly off because the configuration of speech-platform helm chart itself is stored as a value of the spec.valuesContent key in the speech-platform-values.yaml file. Therefore, you need to add number 7 (sincespec.valuesContent is on the 7th line in the configuration file) to the error line number to get the correct line number (== 260):

[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
   250
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper
   255
          # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
          file: "large_v2-1.0.1.model"
        license:
          value:
          "eyJ2ZX...=="
   261
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"
   266
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"
   269
      service:

There is only a license key on line 260. Error message could not find expected ':' which is right because there is no : on this line. One line above (259) there is a key named value which should contain the license. However, the license itself is on line 260, making this file invalid (i.e., it is not in a valid YAML format). To fix it, simply merge lines 259 and 260. The resulting file should look like this:

[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
   250
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper
   255
          # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
          file: "large_v2-1.0.1.model"
        license:
          value: "eyJ2ZX...=="
   260
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"
   265
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"
   268
      service:
        clusterIP: "None"

Disable DNS resolving for specific domains

Check coreDNS logs at first:

kubectl -n kube-system logs -l k8s-app=kube-dns

Following lines in the logs indicate this issue:

2024-06-05T11:00:49.55751974Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:60352->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.546562499Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:40254->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.548101103Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:47838->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.558720939Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:39526->192.168.137.1:53: i/o timeout
2024-06-05T11:00:53.547326187Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:58487->192.168.137.1:53: i/o timeout
2024-06-05T11:00:53.548836432Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:46303->192.168.137.1:53: i/o timeout

This happens when DHCP is used for IP address assignment for the virtual appliance which usually configures nameserver and search domains in /etc/resolv.conf:

nameserver 192.168.137.1
search localdomain

Communication within virtual appliance does not use FQDN, which means that each DNS name is resolved with all domains. Internal kubernetes domains (<namespace>.svc.cluster.local, svc.cluster.local and cluster.local) are resolved immediately with coreDNS, non-kubernetes domains are resolved with nameserver provided by DHCP. If access to the nameserver is blocked (for example, by firewall), then resolving of single name can take up to 10 seconds, which can significantly increase task processing duration.

To avoid this issue, you can either allow communication from virtual appliance to DHCP-configured DNS server or configure kubernetes resolver to skip lookup for DHCP-provided domain(s):

[Virtual appliance] Create file /data/speech-platform/coredns-custom.yaml manually with following content. Replace <domain1.com> and <domain2.com> for domain you want to disable lookup for:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  custom.server: |
    <domain1.com>:53 {
      log
    }
    <domain2.com>:53 {
      log
    }

[Virtual appliance] File looks like:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  custom.server: |
    locadomain:53 {
      log
    }
    example.com:53 {
      log
    }

[Virtual appliance] Restart coreDNS to apply the change:
```
kubectl -n kube-system rollout restart deploy/coredns
```
[Virtual appliance] Check that coreDNS pod is running:
```
kubectl -n kube-system get pods -l k8s-app=kube-dns
```

Upgrade guide

This section describes manual steps which need to be done prior upgrading. There are various changes in the configuration which must be reflected before upgrade. We suggest to always use configuration file bundled with new version of the virtual appliance and update it to suit your needs (insert licenses, enable/disable service, set replicas, ...). If you are not willing to do this, then you must modify your current configuration file to work with new version of the virtual appliance.

This section describes how to perform upgrade of the virtual appliance.

Upgrade and retain data disk

This upgrade approach retains all the data and configuration stored on the data disk.

Pros:

No need to configure virtual appliance from scratch
Prometheus metrics are kept

Cons:

You have to do version-specific upgrade steps

Import new version of virtual appliance (version X+1) into your virtualization platform
Stop current version of virtual appliance (version X)
Detach data disk from current version of virtual appliance (version X)
Attach data disk to new version of virtual appliance (version X+1)
Start new version of virtual appliance (version X+1)
Delete old version of virtual appliance (version X)
Follow version-specific upgrade steps

Upgrade and discard data disk

This upgrade approach discard current data disk and uses new one.

Pros:

Easier upgrade procedure
No version-specific upgrade steps
No accumulated disarray on the data disk

Cons:

You have to configure virtual appliance from the scratch:
- Disable unneeded services
- Insert license keys
- Insert models

Import new version of virtual appliance (version X+1) into your virtualization platform
Stop current version of virtual appliance (version X)
Start new version of virtual appliance (version X+1)
Delete old version of virtual appliance (version X)
Configure virtual appliance from scratch

Upgrade to 3.2.0

This section describes the manual steps which need to be done prior to upgrading to 3.2.0.

Add grpcAdapter license configuration for Time-Analysis and Speech-to-Text-Phonexia

Speech Engine microservices now require additional license. The license is deployed automatically from model package but license configuration must be added.

GPU sharing is enabled by default but it does not work until configuration is created.

Deploy additional components

New technology Language Identification was added. Configuration section must be added before using this technology.

Step by step upgrade guide to 3.2.0

This section describes how to upgrade virtual appliance from 3.1.0 to 3.2.0 with retaining data disk content.

Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside the virtual appliance or via a file browser.

Put following content in the end of the file:

# language-identification subchart config
language-identification:
  enabled: true
  replicaCount: 1
  image:
    repository: phonexia/dev/technologies/microservices/language-identification/main
    registry: airgapped.phonexia.com

  # Extra environment variables
  extraEnvVars: []

  config:
    # Uncomment this to force language-identification to run on GPU
    #device: cuda

    model:
      volume:
        hostPath:
          path: /data/models/language_identification

      # Name of a model file inside the volume, for example "xl-5.1.0.model"
      file: "xl-5.2.0.model"
    license:
      useSecret: true
      secret: language-identification-license
      key: "xl-5.2.0"

  annotations:
    secret.reloader.stakater.com/reload: "language-identification-license"

  # Uncomment this to grant access to GPU for language-identification pod
  #resources:
  #  limits:
  #    nvidia.com/gpu: "1"

  # Uncomment this to run language-identification on GPU
  #runtimeClassName: "nvidia"

  service:
    clusterIP: "None"

  #updateStrategy:
  #type: Recreate

Locate .spec.valuesContent.time-analysis.grpcAdapter

Append config section:

config:
  license:
    useSecret: true
    secret: time-analysis-license
    key: grpc-adapter-license

Section then looks like:

 # Time-analysis subchart
 time-analysis:
 <Not significant lines omitted>
   grpcAdapter:
   <Not significant lines omitted>
     config:
       license:
         useSecret: true
         secret: time-analysis-license
         key: grpc-adapter-license

Locate .spec.valuesContent.speech-to-text-phonexia.grpcAdapter

Append config section:

config:
  license:
    useSecret: true
    secret: speech-to-text-phonexia-license
    key: grpc-adapter-license

Section then looks like:

 # Speech-to-text-phonexia subchart
 speech-to-text-phonexia:
 <Not significant lines omitted>
   grpcAdapter:
   <Not significant lines omitted>
     config:
       license:
         useSecret: true
         secret: speech-to-text-phonexia-license
         key: grpc-adapter-license

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
Check that the configuration is valid and successfully applied.

Create new text file /data/speech-platform/nvidia-device-plugin-configs.yaml either directly from inside the virtual appliance or via a file browser with following content:

apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin-configs
namespace: nvidia-device-plugin
data:
default: |-
  version: v1
  sharing:
    timeSlicing:
      renameByDefault: false
      failRequestsGreaterThanOne: false
      resources:
        - name: nvidia.com/gpu
          replicas: 3

Save the file.
GPU sharing will be configured in a while.

Upgrade to 3.1.0

This section describes the manual steps which need to be done prior to upgrading to 3.1.0.

Change license secret field for Time-Analysis and Speech-to-Text-Phonexia

To ensure unification of loading secrets for all microservices field the way how to load license from secret was changed in Time-Analysis and Speech-to-Text-Phonexia microservices. This will simplify the user experience with the loading licenses.

Upload licenses from secret

Way how the licenses are uploaded to the virtual appliance has simplified. From now on the licenses are imported from models and licenses bundle (.zip file) provided by Phonexia, which after the unzipping loads the licenses and models automatically. This require the configuration change, however the old way is still working.

Deploy additional components

Billing feature is mature enough to be part of the virtual appliance. To deploy billing related components, add following section to the configuration:

    billing:
      enabled: true
      image:
        registry: airgapped.phonexia.com

    restApiGateway:
      image:
        registry: airgapped.phonexia.com
      enabled: true

    postgresql:
      enabled: true
      auth:
        postgresPassword: postgresPassword
      image:
        registry: airgapped.phonexia.com
      metrics:
        enabled: true
        image:
          registry: airgapped.phonexia.com
        serviceMonitor:
          enabled: true
      primary:
        persistence:
          storageClass: manual
          selector:
            matchLabels:
              app.kubernetes.io/name: postgresql

Step by step upgrade guide to 3.1.0

This section describes how to upgrade virtual appliance from 3.0.0 to 3.1.0 with retaining data disk content.

IF YOU ARE ALREADY LOADING LICENSE TROUGH SECRET:

Rename the field loading the Speech-to-Text-Phonexia and Time-Analysis licenses
Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside the virtual appliance or via a file browser.
Locate .spec.valuesContent.<speech-to-text-phonexia OR time-analysis>.config.license

Change it from:

license:
  existingSecret: <secret-name>

To:

license:
  useSecret: true
  secret: <secret-name>
  key: <secret-license-key>

IF YOU WANT TO LOAD LICENSES FROM SECRETS:

Load licenses from secret files
Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside the virtual appliance or via a file browser.
Locate .spec.valuesContent.<microservice>.config.license. <microservice> are all the services requiring license (voiceprint-comparison, voiceprint-extraction, enhanced-speech-to-text-built-on-whisper, speech-to-text-phonexia, time-analysis).

Change it from:

license:
  value: "<license>"

To:

license:
  useSecret: true
  secret: "<microservice>-license"
  key: "<model_name>_<model_version>"

Example:

license:
  useSecret: true
  secret: "enhanced-speech-to-text-built-on-whisper"
  key: "small-1.0.1"

Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside the virtual appliance or via a file browser.
Locate .spec.valuesContent.envoy

Put following content before envoy section:

billing:
  enabled: true
  image:
    registry: airgapped.phonexia.com

restApiGateway:
  image:
    registry: airgapped.phonexia.com
  enabled: true

postgresql:
  enabled: true
  auth:
    postgresPassword: postgresPassword
  image:
    registry: airgapped.phonexia.com
  metrics:
    enabled: true
    image:
      registry: airgapped.phonexia.com
    serviceMonitor:
      enabled: true
  primary:
    persistence:
      storageClass: manual
      selector:
        matchLabels:
          app.kubernetes.io/name: postgresql

Section then looks like this:

     serviceMonitor:
       enabled: true
       additionalLabels:
         release: kube-prometheus-stack

 billing:
   enabled: true
   image:
     registry: airgapped.phonexia.com

 restApiGateway:
   image:
     registry: airgapped.phonexia.com
   enabled: true

 postgresql:
   enabled: true
   auth:
     postgresPassword: postgresPassword
   image:
     registry: airgapped.phonexia.com
   metrics:
     enabled: true
     image:
       registry: airgapped.phonexia.com
     serviceMonitor:
       enabled: true
   primary:
     persistence:
       storageClass: manual
       selector:
         matchLabels:
           app.kubernetes.io/name: postgresql

 envoy:
   enabled: true

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with updated the configuration.
Check that the configuration is valid and successfully applied.

Upgrade to 3.0.0

This section describes the manual steps which need to be done prior to upgrading to 3.0.0.

Rename Whisper microservice

Due to licensing reasons we had to rename the speech-to-text-whisper-enhanced microservice. The new name is enhanced-speech-to-text-built-on-whisper. This change must be reflected in the values file.

Step by step upgrade guide to 3.0.0

This section describes how to upgrade virtual appliance from 2.1.0 to 3.0.0 with retaining data disk content.

Rename whisper microservice in currently running version of virtual appliance.
Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside the virtual appliance or via a file browser.
Locate .spec.valuesContent.speech-to-text-whisper-enhanced.
Replace all occurences of speech-to-text-whisper-enhanced with enhanced-speech-to-text-built-on-whisper.
Replace all occurences of speech_to_text_whisper_enhanced with enhanced_speech_to_text_built_on_whisper.
The updated file should look like this:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      image:
        repository: phonexia/dev/technologies/microservices/enhanced-speech-to-text-built-on-whisper/main
      <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper

Save the file
Rename the directory with Whisper models with the following command:

mv /data/models/speech_to_text_whisper_enhanced /data/models/enhanced_speech_to_text_built_on_whisper

Import new version of virtual appliance (version X+1) into your virtualization platform
Stop current version of virtual appliance (version X)
Detach data disk from current version of virtual appliance (version X)
Attach data disk to new version of virtual appliance (version X+1)
Start new version of virtual appliance (version X+1)
Delete old version of virtual appliance (version X)

Upgrade to 2.1.0

This section describes manual steps which need to be done prior upgrading to 2.1.0.

Load Speech to Text Phonexia and Time Analysis model from data disk instead of image

In new version, default way how to load models for Speech to Text Phonexia and Time Analysis will change. Before, models were loaded from image which lead to lot of duplicity in images. From now on, we will consider loading models from data disk as a default. However, the old way of loading models from image will still work.

Upgrade to load models from data disk (/data/models) require to update speech platform values file:

Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser
Locate .spec.valuesContent.speech-to-text-phonexia.config.instances or .spec.valuesContent.time-analysis.config.instances key
Define versions of images (imageTag) without model (e.g. 3.61.0)
Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: ar-kw
            imageTag: 3.61.0
            onDemand:
              enabled: true
          - name: ar-kx
            imageTag: 3.61.0
            onDemand:
              enabled: true
    time-analysis:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: tae
            imageTag: 3.61.0
            onDemand:
              enabled: true

Locate .spec.valuesContent.speech-to-text-phonexia.image or .spec.valuesContent.time-analysis.image key to uncomment the image section.
Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      image:
        registry: airgapped.phonexia.com
      <Not significant lines omitted>

    time-analysis:
    <Not significant lines omitted>
      image:
        registry: airgapped.phonexia.com
      <Not significant lines omitted>

Save the file

Add ingressAdmin section

Open the text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate the key .spec.valuesContent.ingress.extraBackends
Remove the extraBackends scope with all of its contents
Add new ingressAdmin scope on the same indentation as the ingress scope. The resulting file should look like this:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
    ingress:
      <Not significant lines omitted>

    ingressAdmin:
      enabled: true
      annotations: {}
      singleFileUploadSize: "5368709120"
      singleFileUploadTimeout: 120
    <Not significant lines omitted>

Save the file
Proceed with upgrade

Fix permission for prometheus storage

This is post-upgrade task. Must be run when virtual appliance is upgraded to 2.1.0.

Run following command in the virtual appliance to fix permissions of the prometheus storage:

$ chmod -R a+w /data/storage/prometheus/prometheus-db/

Upgrade to 2.0.0

This section describes manual steps which need to be done prior upgrading to 2.0.0.

Rename speech-engine subchart to speech-to-text-phonexia

Due to renaming speech-engine subchart you have to update speech platform values file before upgrading:

Open the new text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-engine.
Rename speech-engine to speech-to-text-phonexia.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>

Save the file

Rename speech-to-text-phonexia instances

Open the new text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.
Remove stt- prefix from the name of each instance.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: ar-kw
            imageTag: 3.60.1-stt-ar_kw_6
            onDemand:
              enabled: true
          - name: ar-kx
            imageTag: 3.60.1-stt-ar_xl_6
            onDemand:
              enabled: true

Save the file

Add proper tag suffix for Media Conversion

Open the new text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.media-conversion.image.
Change the value of the tagSuffix key to -free.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    media-conversion:
      <Not significant lines omitted>
      image:
        <Not significant lines omitted>
        tagSuffix: "-free"
    <Not significant lines omitted>

Save the file
Proceed with upgrade

Update path to models

Default model location was changed from /data/models to /data/models/<microservice>. If you plan to upgrade and keep current data disk, no steps are needed. Model are loaded from old location which is /data/models. If you plan to upgrade from scratch (discarding the current data disk), no steps are needed as well - models are loaded from new location which is /data/models/<microservice>.

How to modify OVF to Hyper-V compatible VM

Both of existing virtual HDDs (.vmdk) need to be converted to Hyper-V compatible HDDs (.vhdx). Do it through this program: Starwind V2V Converter.
Create new VM in Hyper-V.
IMPORTANT: Use Generation 1 VM - Generation 2 doesn't work.
Enable networking/make sure it is enabled.
OPTIONAL: Disable options like DVD drive or SCSI controller since they are not needed.
Set Memory to at least 16GB and CPUs to at least 8 cores.
Attach HDDs, preferably onto one IDE controller.
Start the VM.
After it starts, check IP address either printed out on a login screen. Wait for the entire engine to start.
Go to the IP from the previous step and verify that the entire VM works as it should.

Load balancing

The performance of a single instance of the virtual appliance is of course limited by the HW resources and by the number of concurrent tasks the API component can handle. To work around these limitations, we can advise you to deploy multiple instances of the virtual appliance and put a load balancer before them.

How the load balancer works

The load balancer (LB) must ensure that the requests for the same task are routed to the same instance of the virtual appliance. This is called a stateful session. It can be achieved with a session cookie or with a session header.

The request flow is then following:

The client POSTs a task to the LB.
The LB picks a virtual appliance instance (depending on an LB algorithm) and sends the request there.
The API in the virtual appliance accepts the task and sends a response back to the LB.
The LB adds a session cookie or a session header to the response and sends it back to the client.
The client extracts the task id and the session cookie or session header from the response.
The client polls for the task. It sends a GET request with the session cookie or session header to the LB.
The LB routes the request to the proper instance of the virtual appliance based on the session cookie or session header.

In the following example, have used Envoy as the load balancer. Any other load balancer can be used if it supports stateful sessions.

Envoy configuration

This is the example Envoy configuration:

static_resources:
  listeners:
    - address:
        socket_address:
          # Load balancer address and port
          # This is where Envoy accepts the incoming traffic
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                      log_format:
                        text_format_source:
                          inline_string: >
                            [%START_TIME%] "%REQ(:METHOD)%
                            %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
                            %RESPONSE_CODE% %RESPONSE_FLAGS%
                            %RESPONSE_CODE_DETAILS%
                            %UPSTREAM_REQUEST_ATTEMPT_COUNT% %BYTES_RECEIVED%
                            %BYTES_SENT% %DURATION%
                            %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%
                            "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%"
                            "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%"
                            "%UPSTREAM_HOST%" "%REQ(REQUEST-ID)%"
                            "%REQ(CORRELATION-ID)%" "%REQ(session-header)%"
                            "%RESP(session-header)%"
                codec_type: AUTO
                stat_prefix: ingress_http
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: backend
                      domains:
                        - "*"
                      routes:
                        - match:
                            prefix: "/api/"
                          route:
                            cluster: speech-platform-virtual-appliance
                            retry_policy:
                              retry_on: "retriable-status-codes"
                              # Retry request on a different upstream when the 429 response is received
                              # This should happen when POSTing a request/task but max concurrent tasks limit is reached
                              # This ensures that task is accepted in the other (== less busy) instance of the virtual appliance
                              retriable_status_codes:
                                - 429
                              # How many times is the request retried
                              # Should be # of virtual appliance instances minus 1
                              num_retries: 1

                http_filters:
                  - name: envoy.filters.http.stateful_session
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.stateful_session.v3.StatefulSession
                      strict: true
                      session_state:
                        name: envoy.http.stateful_session.header
                        typed_config:
                          "@type": type.googleapis.com/envoy.extensions.http.stateful_session.header.v3.HeaderBasedSessionState
                          # Name of the session header
                          # Contains base64 encoded upstream_address:port
                          # This tells Envoy to which upstream server it should send the request
                          name: session-header
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
    - name: speech-platform-virtual-appliance
      connect_timeout: 0.5s
      type: STATIC
      dns_lookup_family: V4_ONLY
      lb_policy: RANDOM
      load_assignment:
        cluster_name: speech-platform-virtual-appliance
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      # IP address of the first instance of the virtual appliance
                      address: 1.2.3.4
                      # Port of the first instance of the virtual appliance
                      port_value: 80
              - endpoint:
                  address:
                    socket_address:
                      # IP address of the second instance of the virtual appliance
                      address: 1.2.3.5
                      # Port of the second instance of the virtual appliance
                      port_value: 80
      health_checks:
        - timeout: 2s
          interval: 60s
          interval_jitter: 1s
          unhealthy_threshold: 3
          healthy_threshold: 3
          http_health_check:
            # Healthcheck uri of the speech api inside virtual appliance
            path: /api/system/status

# Admin interface for looking at things
admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9090

Access API with LB

Here is an example script to show how to work with a header-based stateful session using curl:

#!/bin/bash

# URL of the virtual appliance or load balancer
platform_url=http://localhost:8080

# URI to POST the task to
uri="/api/technology/speech-to-text?language=en"

# Path to audio file for processing
voice_file=/tmp/audio.wav

# Proccess this many tasks in parallel
parallel=100

# End when this many tasks are processed
total_tasks=400

# Post tasks to the API so that we still have $parallel tasks running
post_tasks() {
  local count=$1
  local tmpfile_task=/tmp/task.${task_counter}.json
  local tmpfile_headers=/tmp/headers.${task_counter}.txt

  for i in $(seq 1 $count); do
    # POST single task
    echo "[${task_counter}] Task $i of ${count}"
    curl \
      -L -s -X POST  \
      -H 'Content-Type: multipart/form-data' \
      -H 'Accept: application/json' \
      -F file=@"${voice_file}" \
      --output ${tmpfile_task} \
      --dump-header ${tmpfile_headers} \
      "${platform_url}${uri}"

    rv=$?

    # Parse session header
    session_header=$(grep session-header ${tmpfile_headers} | cut -d ':' -f 2)
    echo "[${task_counter}] Curl response code is: ${rv}"
    echo "[${task_counter}] Session header is: ${session_header}"
    echo "[${task_counter}] $(cat ${tmpfile_task})"


    task_id=$(jq -r '.task.task_id' ${tmpfile_task})
    # Store task id
    current_tasks+=($task_id)
    # Store session header for each task id
    taskToHeader["${task_id}"]="${session_header}"
    task_counter=$((${task_counter} +1))
  done
}

# Poll for all running tasks
poll_tasks() {
  local counter_done=0
  local counter_rejected=0
  local counter_running=0
  local counter_pending=0
  local counter_unknown=0

  # Poll status of each task
  for task_id in ${current_tasks[@]}; do
    local tmpfile_task=/tmp/task-id-${task_id}

    # Add session header
    curl -s --header "session-header:${taskToHeader[${task_id}]}" -L -o ${tmpfile_task} "${platform_url}/api/task/${task_id}"
    rv=$?

    echo "[${task_id}] Curl response code is: ${rv}"
    task_status=$(jq -r '.state' ${tmpfile_task})
    echo "[${task_id}] Task is still ${task_status}..."

    # Evaluate task status
    case $task_status in
      pending)
        counter_pending=$((${counter_pending} +1))
      ;;

      running)
        counter_running=$((${counter_running} +1))
      ;;

      rejected)
        counter_rejected=$((${counter_rejected} +1))
      ;;

      done)
        counter_done=$((${counter_done} +1))
        counter_total_done=$((${counter_total_done} +1))
        finished_tasks+=(${task_id})
      ;;

      *)
        counter_unknown=$((${counter_unknown} +1))
      ;;
    esac
  done
  echo "Summary: Done: ${counter_done}, Rejected: ${counter_rejected}, Running: ${counter_running}, Pending: ${counter_pending}, Unknown: ${counter_unknown}"
}

rm -f /tmp/task-id-*

if [ ! -f ${voice_file} ]; then
  echo "Voicefile does not exists!"
  exit 1
fi

current_tasks=()
declare -A taskToHeader
task_counter=1
start_time=$(date '+%s')
counter_total_done=0

# Control loop
while true; do
  finished_tasks=()
  poll_tasks

  # Remove finished tasks
  for del in ${finished_tasks[@]}
  do
    current_tasks=(${current_tasks[@]/$del})
  done

  echo "Task counter: ${task_counter}, Finished tasks: ${counter_total_done}"
  if [ ${counter_total_done} -ge ${total_tasks} ]; then
    echo "Reached ${total_tasks} finished tasks."
    echo "Start time: ${start_time}"
    end_time=$(date '+%s')
    echo "End time: ${end_time}"
    echo "Duration: $(( ${end_time} - ${start_time} ))"
    echo "Voicefile: ${voice_file}"
    echo "task parallelism: ${parallel}"
    break
  fi

  # POST tasks to have $parallel tasks running all the time
  if [ ${#current_tasks[@]} -le $parallel ]; then
    post_tasks $(($parallel - ${#current_tasks[@]}))
  fi

  sleep 2
done

Installation​

Prerequisites​

Minimal HW requirements​

Resource usage per technology​

GPU​

Installation guide​

Post-installation steps​

SSH server​

Open ports​

K3s check​

Kubernetes check​

Application check​

Components​

Operating system​

GPU support​

Kubernetes​

Ingress controller​

Speech platform​

File Browser​

Prometheus​

Grafana​

Disks​

System disk​

Data disk​

Configuration​

VirtualBox configuration​

Upload microservices models and licenses​

Changing microservice models​

Inspect microservices models​

Inspect microservices licenses​

Set DNS name for speech platform virtual appliance​

Use HTTPS certificate​

Extend disks​

Identify system and data disks​

Extend disk prerequisites​

Extend system disk (mount point / filesystem)​

Extend data disk​

Disable unneeded microservices​

Phonexia speech to text microservice​

Permanent vs onDemand instances​

Configure languages in speech-to-text-phonexia microservice​

Modify replicas for permanent language instances​

Modify parallelism for instances​

Add custom images​

Add language images for speech to text phonexia​

Add gpu-powered image for voiceprint-extraction​

Add gpu-powered image for language-identification​

Modify microservice replicas​

Run enhanced-speech-to-text-built-on-whisper microservice on GPU​

Run Voiceprint Extraction microservice on GPU​

Run Language Identification on GPU​

Change model used in a microservice​

Change model in enhanced-speech-to-text-built-on-whisper microservice​

Load Speech to Text Phonexia and Time Analysis model from data disk​

Process patented audio codecs with media-conversion​

Pull Media Conversion image on the fly​

Put Media Conversion image into virtual appliance​

Disable DNS resolving for specific domains​

Limits​

API limits​

How to change the API limits​

UI limits​

Speaker Identification UI limits​

Speech to text UI limits​

Language Identification UI limits​

How to change the UI limits​

Pod count limits​

How to change the pod count limits​

Admin backends limits​

How to change admin backends limits​

GPU sharing limits​

How to change GPU sharing limits​

Admin console​

Grafana​

Troubleshooting​

Speech platform components​

Request flow​

Check node status​

Disk pressure​

Memory pressure​

Installation

Prerequisites

Minimal HW requirements

Resource usage per technology

GPU

Installation guide

Post-installation steps

SSH server

Open ports

K3s check

Kubernetes check

Application check

Components

Operating system

GPU support

Kubernetes

Ingress controller

Speech platform

File Browser

Prometheus

Grafana

Disks

System disk

Data disk

Configuration

VirtualBox configuration

Upload microservices models and licenses

Changing microservice models

Inspect microservices models

Inspect microservices licenses

Set DNS name for speech platform virtual appliance

Use HTTPS certificate

Extend disks

Identify system and data disks

Extend disk prerequisites

Extend system disk (mount point `/` filesystem)

Extend data disk

Disable unneeded microservices

Phonexia speech to text microservice

Permanent vs onDemand instances

Configure languages in speech-to-text-phonexia microservice

Modify replicas for permanent language instances

Modify parallelism for instances

Add custom images

Add language images for speech to text phonexia

Add gpu-powered image for voiceprint-extraction

Add gpu-powered image for language-identification

Modify microservice replicas

Run enhanced-speech-to-text-built-on-whisper microservice on GPU

Run Voiceprint Extraction microservice on GPU

Run Language Identification on GPU

Change model used in a microservice

Change model in enhanced-speech-to-text-built-on-whisper microservice

Load Speech to Text Phonexia and Time Analysis model from data disk

Process patented audio codecs with media-conversion

Pull Media Conversion image on the fly

Put Media Conversion image into virtual appliance

Disable DNS resolving for specific domains

Limits

API limits

How to change the API limits

UI limits

Speaker Identification UI limits

Speech to text UI limits

Language Identification UI limits

How to change the UI limits

Pod count limits

How to change the pod count limits

Admin backends limits

How to change admin backends limits

GPU sharing limits

How to change GPU sharing limits

Admin console

Grafana

Troubleshooting

Speech platform components

Request flow

Check node status

Disk pressure

Memory pressure