Speech Platform Virtual Appliance
The Speech Platform Virtual Appliance is a distribution of the Phonexia Speech Platform in the form of a virtual image. Presently, it exclusively supports the OVF format.
Installation
This section describes how to install virtual appliance into your virtualization platform.
Prerequisites
Currently we support only virtualbox and VMWare.
It will probably work on other virtualization platforms but we haven't tested it yet.
Minimal HW requirements
- 50GB of disk space
- 4 CPU cores
- 16GB of memory
Minimal requirements mean that you are able to process single technology (speaker identification, speech-to-text by Whisper or speech-to-text by Phonexia) for evaluation purposes. We recommend to disable all non-needed (not evaluated) technologies to save the resources.
Resource usage per technology
- Speaker identification - 1 CPU core and 2GB memory
- Speech-to-text by phonexia - 1 CPU core and 4GB memory per language
- Speech-to-text by whisper - 8 CPU cores and 8GB memory or 1 CPU core and 8GB memory and GPU card
Note: Running speech-to-text by Whisper on CPU is slow. We recommend to use at least 8 CPU cores to run our built-in examples in reasonable time.
GPU
GPU is not required to make virtual appliance work but you will suffer serious performance degradation for Whisper speech-to-text functionality.
If you decide to use GPU, then make sure that
- Server HW (especially BIOS) has support for IOMMU.
- Host OS can pass GPU device to virtualization platform (== Host OS can be configured to NOT use the GPU device)
- Virtualization platform can pass GPU device to guest OS.
Installation guide
- Download virtual appliance
- Import virtual appliance to your virtualization platform (For Hyper-V deployment, please refer to section 'How to modify OVF to Hyper-V compatible VM')
- Run virtual appliance
Post-installation steps
Virtual appliance is configured to obtain IP address from DHCP server. If you are not using DHCP server for IP allocation or prefer to set up static IP, then you have to reconfigure the OS.
SSH server
SSH server is deployed and enabled in virtual appliance. Use following credentials:
login: root
password: InVoiceWeTrust
We recommend to change the root password and disable password authentication via SSH for root user in favor of key-based authentication.
Open ports
List of open ports:
- SSH (22) - for convenient access to OS
- HTTP (80) - Speech platform is accessible via HTTP protocol
- HTTPS (443) - Speech platform is also accessible via HTTPS protocol
- HTTPS (6443) - Kubernetes API
- HTTPS (10250) - Metrics server
K3s check
K3s (kubernetes distribution) is started automatically by systemd when virtual appliance is started. You can verify whether k3s is running or not with this command:
systemctl status k3s
Kubernetes check
When k3s service is started, it takes some time until application (== kubernetes pods) is started. Usually it takes around 2 minutes. To check if application is up and running, execute following command:
kubectl -n speech-platform get pods
When all pods are running, output looks like:
[root@speech-platform ~]# kubectl -n speech-platform get pods
NAME READY STATUS RESTARTS AGE
speech-platform-docs-57dcd49f9f-q97w4 1/1 Running 0 2m10s
speech-platform-envoy-759c9b49d9-99vp7 1/1 Running 0 2m10s
speech-platform-frontend-7f4566dbc6-jhprh 1/1 Running 0 2m10s
speech-platform-assets-5697b4c86-8sh9k 1/1 Running 0 2m9s
speech-platform-media-conversion-7d8f884f9-zh75g 1/1 Running 0 2m9s
speech-platform-api-69bc7d4d5b-6kv7x 1/1 Running 0 2m9s
speech-platform-speech-to-text-whisper-enhanced-74548494c866mrz 0/1 CrashLoopBackOff 4 (29s ago) 2m10s
speech-platform-voiceprint-extraction-68d646d449-9br8m 0/1 CrashLoopBackOff 4 (33s ago) 2m10s
speech-platform-voiceprint-comparison-76948b4947-xjw92 0/1 CrashLoopBackOff 4 (15s ago) 2m10s
Voiceprint-extraction, voiceprint-comparision and speech-to-text-whisper-enhanced microservices (pods) are failing initially. This is expected and it is caused by missing license. You can either add license to microservices or disable them if you don't plan to use them.
Optionally you can check if all other system and auxiliary applications are running:
kubectl get pods -A
All pods should be running or completed, like this:
[root@speech-platform ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-8d98546c4-9pq8p 1/1 Running 0 6m44s
kube-system coredns-94bcd45cb-rp6zx 1/1 Running 0 6m44s
kube-system metrics-server-754ff994c9-pczpx 1/1 Running 0 6m44s
kube-system svclb-ingress-nginx-controller-baed713a-nzwcc 2/2 Running 0 5m24s
kube-system helm-install-ingress-nginx-wpwk4 0/1 Completed 0 6m45s
kube-system helm-install-filebrowser-fd569 0/1 Completed 0 6m45s
kube-system helm-install-nginx-28rll 0/1 Completed 0 6m45s
kube-system helm-install-speech-platform-7k6qf 0/1 Completed 0 6m45s
ingress-nginx ingress-nginx-controller-679f97c77d-rdssr 1/1 Running 0 5m24s
nginx nginx-6ddd78f789-f9lq2 1/1 Running 0 5m39s
filebrowser filebrowser-7476f7c65c-rk9d5 1/1 Running 0 5m39s
gpu nfd-58s4x 2/2 Running 0 5m44s
speech-platform speech-platform-docs-57dcd49f9f-q97w4 1/1 Running 0 5m38s
speech-platform speech-platform-envoy-759c9b49d9-99vp7 1/1 Running 0 5m38s
speech-platform speech-platform-frontend-7f4566dbc6-jhprh 1/1 Running 0 5m38s
speech-platform speech-platform-assets-5697b4c86-8sh9k 1/1 Running 0 5m37s
speech-platform speech-platform-media-conversion-7d8f884f9-zh75g 1/1 Running 0 5m37s
speech-platform speech-platform-api-69bc7d4d5b-6kv7x 1/1 Running 0 5m37s
speech-platform speech-platform-voiceprint-extraction-68d646d449-9br8m 0/1 CrashLoopBackOff 5 (2m33s ago) 5m38s
speech-platform speech-platform-speech-to-text-whisper-enhanced-74548494c866mrz 0/1 CrashLoopBackOff 5 (2m32s ago) 5m38s
speech-platform speech-platform-voiceprint-comparison-76948b4947-xjw92 0/1 CrashLoopBackOff 5 (2m20s ago) 5m38s
Application check
Access virtual appliance welcome page on virtual appliance to see IP address or hostname from your local computer. If you are able to access the welcome page, applications should work.
Components
This is the list of components virtual appliance is composed of.
Operating system
There is Rocky Linux 9.3 under the hood.
GPU support
Virtual appliance has all necessary prerequisities pre-baked to allow run GPU-powered workloads (especially speech-to-text-whisper-enhanced). This means that NVIDIA drivers and container toolkit are already installed.
Kubernetes
There is k3s kubernetes distribution deployed inside.
Ingress controller
We use ingress-nginx ingress controller. This component is serving as reverse proxy and loadbalancer.
Speech platform
This is the application for solving various voice-related problems like speaker identification, speech-to-text transcription and many more. Speech platform is accessible via web browser or API.
File Browser
File Browser is web-based file browser/editor used to work with data on data disk.
Prometheus
Prometheus is a tool for providing monitoring information about kubernetes components.
Grafana
Grafana is a tool for visualization of prometheus metrics.
Disks
Virtual appliance comes with system disk and data disk.
System disk
Operating system is installed on system disk. You should not modify system disk unless you know what you are doing.
List of component stored on system disk:
- NVIDIA drivers
- Container images for microservices
- Packaged helm charts
Data disk
Data disk is used as persistent storage. Unlike system disk, data disk is
intented to contain files which can be viewed/modified by the user. Data disk is
created with PHXDATADISK
label and system is instructed to mount filesystem
with this label to /data
directory.
List of components stored on data disk:
- Logs (
/data/logs
) of the system, k3s and individual containers - Configuration for ingress controller
(
/data/ingress-nginx/ingress-nginx-values.yaml
) - Configuration for speech platform
(
/data/speech-platform/speech-platform-values.yaml
) - Models for individual microservices (
/data/models/
) - Custom images (
/data/images/
)
Configuration
Following section describes various configuration use cases.
Insert license keys
Virtual appliance is distributed without license. Speech platform does not work without valid license. If you haven't received any licence, please contact Phonexia support. License must be inserted into each microservice.
Insert license into speech-to-text-whisper-enhanced microservice
- Get license for speech-to-text model. License looks like
eyJ2...In0=
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. You can access the file browser via URLhttp://<IP_address_of_virtual_appliance>/filebrowser/
- Locate key
.spec.valuesContent.speech-to-text-whisper-enhanced.config.license.value
- Change content of the value key from
"<put your license for speech-to-text-by-whisper model here>"
to license key. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
config:
<Not significant lines omitted>
license:
value: "eyJ2...In0=" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Insert license into voiceprint-extraction microservice
- Get license for speaker-identification model. License looks like
eyJ2...In0=
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.voiceprint-extraction.config.license.value
- Change content of the value key from
"<put your license for speaker-identification model here>"
to license key. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-extraction:
<Not significant lines omitted>
config:
<Not significant lines omitted>
license:
value: "eyJ2...In0=" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Insert license into voiceprint-comparison microservice
- Get license for speaker-identification model. License looks like
eyJ2...In0=
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.voiceprint-comparison.config.license.value
- Change content of the value key from
"<put your license for speaker-identification model here>"
to license key. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
voiceprint-comparison:
<Not significant lines omitted>
config:
<Not significant lines omitted>
license:
value: "eyJ2...In0=" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Insert license into speech-to-text-phonexia microservice
-
Get license for speech-to-text-phonexia. License looks like:
SERVER license.phonexia.com/lic
USE_TIME
PRODUCT SPE_v3 ACB46...
2uJ...M9A==
PRODUCT STT-tech F23B6...
jXu...K7A= -
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. -
Locate key
.spec.valuesContent.speech-to-text-phonexia.config.license.value
-
Change content of the value key from
"<put your license for speech-to-text-phonexia here>"
to the license key. -
Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
license:
value: |
SERVER license.phonexia.com/lic
USE_TIME
PRODUCT SPE_v3 ACB46...
2uJ...M9A==
PRODUCT STT-tech F23B6...
jXu...K7A= -
Save the file
-
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Insert license into time-analysis microservice
-
Get license for time-analysis. License looks like:
SERVER license.phonexia.com/lic
USE_TIME
PRODUCT SPE_v3 ACB46...
2uJ...M9A==
PRODUCT TAE-tech D3118...
Ba+4...eg== -
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. -
Locate key
.spec.valuesContent.time-analysis.config.license.value
-
Change content of the value key from
"<put your license for time-analysis here>"
to the license key. -
Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
license:
value: |
SERVER license.phonexia.com/lic
USE_TIME
PRODUCT SPE_v3 ACB46...
2uJ...M9A==
PRODUCT TAE-tech D3118...
Ba+4...eg== -
Save the file
-
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Set DNS name for speech platform virtual appliance
Speech platform is accessible on http://<IP_address_of_virtual_appliance>
. We
recommend to create DNS record to make access more comfortable for users.
Consult your DNS provider to get more information how to add corresponding DNS
record.
Use HTTPS certificate
Speech platform is also accessible via HTTPS protocol on
https://<IP_address_of_virtual_appliance>
. If you prefer secure communication
you might need to use your own TLS certificate for securing the communication.
To do so, follow this guide:
- Prepare the TLS certificate beforehand.
- Put certificate private key in file named
cert.key
. - Put certificate into file named
cert.crt
. - Create kubernetes secret manifest storing the certificate and private key:
kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml
- Copy manifest (resulting file) to
/data/ingress-nginx/certificate-server.yaml
. - Open text file
/data/ingress-nginx/ingress-nginx-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.controller.extraArgs.default-ssl-certificate
- Uncomment the line.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
<Not significant lines omitted>
extraArgs:
<Not significant lines omitted>
default-ssl-certificate: "ingress-nginx/default-ssl-certificate" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Extend disks
Following section describes how to extend both system and data disk.
At first run lsblk
to see all disk devices in the system:
$ lsblk
This is example output:
[root@speech-platform ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1
│ 259:0 0 40G 0 disk
├─nvme0n1p1
│ 259:4 0 1G 0 part /boot
├─nvme0n1p2
│ 259:5 0 4G 0 part /
├─nvme0n1p3
│ 259:6 0 33.3G 0 part /var/lib/kubelet/pods/c699b49a-d5cb-4e63-8555-0a11f4204bb6/volume-subpaths/capabilities/frontend/0
│ /var/lib/kubelet/pods/29a7020e-1b24-4720-a94d-b458e9297fbe/volume-subpaths/pvc-cf413864-ffa6-4e3b-bfb6
│ /var/lib/kubelet/pods/1093b1d0-4dfb-48fa-910b-cc04a553b155/volume-subpaths/sc-dashboard-provider/grafana/4
│ /var/lib/kubelet/pods/1093b1d0-4dfb-48fa-910b-cc04a553b155/volume-subpaths/config/grafana/2
│ /var/lib/kubelet/pods/1093b1d0-4dfb-48fa-910b-cc04a553b155/volume-subpaths/config/grafana/0
│ /var/lib/kubelet/pods/7e28800d-ad39-4d92-8d9c-cd24fce8f861/volume-subpaths/config/filebrowser/0
│ /var
├─nvme0n1p4
│ 259:7 0 1K 0 part
└─nvme0n1p5
259:8 0 1023M 0 part [SWAP]
nvme1n1
│ 259:1 0 10G 0 disk
└─nvme1n1p1
259:3 0 10G 0 part /var/log
/var/lib/rancher/k3s/server/manifests/speech-platform/values
/var/lib/rancher/k3s/server/manifests/ingress-nginx/values
/data
System disk is the one with /
, /boot
and /var
mountpoints. In this example
it is nvme0n1
. Data disk is the one with /data
mountpoint. In this example
it is nvme1n1
.
Extend system disk
System disk is nvme0n1
and partition with /var
mountpoint (which is
nvme0n1p3
) needs to be extended. At first extend the system disk in the
virtualization platform. Then verify that you can see extended disk inside the
virtual appliance.
lsblk --nodeps
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 50G 0 disk
nvme1n1 259:1 0 20G 0 disk
System disk was extended from 40GB to 50GB in the virtualization platform. Now you can extend the system disk in the virtual appliance itself.
Recreate partition 3 with all disk space:
echo ", +" | sfdisk --force -N 3 /dev/nvme0n1
Run partprobe to for system to use new partition:
partprobe
Run lsblk to check if system sees the resized partition:
lsblk /dev/nvme0n1
Extend xfs filesystem:
xfs_growfs /var
Check that disk was resized:
df -h /var
Extend data disk
Data disk is nvme1n1
and partition with /data
mountpoint (which is
/dev/nvme1n1p1) needs to be extended.
At first extend the data disk in the virtualization platform. Then verify that you can see extended disk inside the virtual appliance.
lsblk --nodeps
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 50G 0 disk
nvme1n1 259:1 0 20G 0 disk
Data disk was extended from 10GB to 20GB in the virtualization platform. Now you can extend the data disk in the virtual appliance itself.
Recreate partition 1 with all disk space:
echo ", +" | sfdisk --force -N 1 /dev/nvme1n1
Run partprobe to for system to use new partition:
partprobe
Run lsblk to check if system sees the resized partition:
lsblk /dev/nvme1n1
Extend xfs filesystem:
xfs_growfs /data
Check that disk was resized:
df -h /data
Disable unneeded microservices
Virtual appliance comes with all microservices enabled by default. You may decide to disable microservice if you do not plan to use it. Disabled microservice does not consume any compute resources.
- Find out which microservices you want to disable -
voiceprint-extraction
,voiceprint-comparison
,speech-to-text-whisper-enhanced
or speech-to-text-phonexia. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.<microservice>.enabled
- Change the value from
true
tofalse
. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
enabled: false - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Phonexia speech to text microservice
This section describes configuration specific to phonexia speech to text microservice.
Permanent vs onDemand instances
Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.
All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6
onDemand:
enabled: true
.
.
. - Delete onDemand key and its subkeys.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6
.
.
. - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Configure languages in speech-to-text-phonexia microservice
This microservice consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.
Note: Docker images for any language are not included in the virtual appliance. This means that virtual appliance needs to access the internet to download the docker image when speech-to-text-phonexia microservice is used! As a workaround you can put custom image into virtual appliance.
By default all languages/instances are enabled. List of languages:
- ar_kw_6
- ar_xl_6
- bn_6
- cs_cz_6
- de_de_6
- en_us_6
- es_6
- fa_6
- fr_fr_6
- hr_hr_6
- hu_hu_6
- it_it_6
- ka_ge_6
- kk_kz_6
- nl_6
- pl_pl_6
- ps_6
- ru_ru_6
- sk_sk_6
- sr_rs_6
- sv_se_6
- tr_tr_6
- uk_ua_6
- vi_vn_6
- zh_cn_6
How to disable all language instances except of cs_cz_6 and en_us_6:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponsing section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: stt-ark
imageTag: 3.59.0-stt-ar_kw_6
onDemand:
enabled: true
- name: stt-arx
imageTag: 3.59.0-stt-ar_xl_6
onDemand:
enabled: true
- name: stt-bn
imageTag: 3.59.0-stt-bn_6
onDemand:
enabled: true
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6
onDemand:
enabled: true
- name: stt-de
imageTag: 3.59.0-stt-de_de_6
onDemand:
enabled: true
- name: stt-en
imageTag: 3.59.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
- name: stt-vi
imageTag: 3.59.0-stt-vi_vn_6
onDemand:
enabled: true
- name: stt-zh
imageTag: 3.59.0-stt-zh_cn_6
onDemand:
enabled: true - Comment out all the instances except (cs_cz_6 and en_us_6).
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
#- name: stt-ark
# imageTag: 3.59.0-stt-ar_kw_6
# onDemand:
# enabled: true
#- name: stt-arx
# imageTag: 3.59.0-stt-ar_xl_6
# onDemand:
# enabled: true
#- name: stt-bn
# imageTag: 3.59.0-stt-bn_6
# onDemand:
# enabled: true
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6
onDemand:
enabled: true
#- name: stt-de
# imageTag: 3.59.0-stt-de_de_6
# onDemand:
# enabled: true
- name: stt-en
imageTag: 3.59.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
#- name: stt-vi
# imageTag: 3.59.0-stt-vi_vn_6
# onDemand:
# enabled: true
#- name: stt-zh
# imageTag: 3.59.0-stt-zh_cn_6
# onDemand:
# enabled: true - Or you can even delete the instances you are not interested in.
- Then updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6
onDemand:
enabled: true
- name: stt-en
imageTag: 3.59.0-stt-en_us_6
onDemand:
enabled: true - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Modify replicas for permanent language instances
Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.
Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 16GB memory)! Note: OnDemand instance has always only one replica.
- Find out which language instance you want to configure replicas for.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount
. - Change the value to desired amount of replicas.
- Updated file should look like:
- Corresponsing section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6
replicaCount: 2 - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Add custom images
This section describes how to add custom image into virtual appliance. Typical use case is to add speech to text images to Speech Engine for languages you want to use. These images needs to be added to the data disk in order to make phonexia speech to text work offline. In the example I will add two images - phonexia speech to text english and czech languages.
- [Virtual appliance] Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - [Virtual appliance] Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - [Virtual appliance] Choose which images you want to add. Use
imageTag
key to find out which image tag(s) to use:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: stt-cs
imageTag: 3.59.0-stt-cs_cz_6 <- This is the image tag
onDemand:
enabled: true
- name: stt-en
imageTag: 3.59.0-stt-en_us_6 <- This is the image tag
onDemand:
enabled: true - [PC] Pull all images:
docker pull phonexia/spe:3.60.1-stt-en_us_6
docker pull phonexia/spe:3.60.1-stt-cs_cz_6 - [PC] Save all images to single tar archive:
docker save -o images.tar phonexia/spe:3.60.1-stt-cs_cz_6 phonexia/spe:3.60.1-stt-en_us_6
- [PC] Copy
images.tar
file into virtual appliance via ssh or filebrowser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
- [Virtual appliance] Restart virtual appliance to load the images or load them
manually with:
ctr image import /data/images/images.tar
Modify microservice replicas
Each microservice has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding microservices.
Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 16GB memory)!
- Find out which microservices you want to modify replicas -
voiceprint-extraction
,voiceprint-comparison
andspeech-to-text-whisper-enhanced
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.<microservice>.replicaCount
- Change the value to desired amount of replicas.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
replicaCount: 2 - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Run speech-to-text-whisper-enhanced microservice on GPU
At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi
to list all the devices. If device is present and visible to the system, then
output should look like:
[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the speech-to-text-whisper-enhanced to use GPU for the processing.
-
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. -
Locate speech-to-text-whisper-enhanced section
.spec.valuesContent.speech-to-text-whisper-enhanced
. -
Locate key
.spec.valuesContent.speech-to-text-whisper-enhanced.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force whisper to run on GPU
device: cuda -
Locate key
.spec.valuesContent.speech-to-text-whisper-enhanced.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key
.spec.valuesContent.speech-to-text-whisper-enhanced.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
# Uncomment this to run whisper on GPU
runtimeClassName: "nvidia" -
Locate key
.spec.valuesContent.speech-to-text-whisper-enhanced.updateStrategy
. -
Set
type
toRecreate
to allow seemless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
# Uncomment this to allow seemless updates on single GPU machine
updateStrategy:
type: Recreate -
Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file
-
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Change model used in a microservice
Each microservice needs a model to do its job properly. We provide more models for some microservices, for example speech-to-text-whisper-enhanced. Usually we pre-configure microservices with the most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.
License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.
Change model in speech-to-text-whisper-enhanced microservice
We offer following models for speech-to-text-whisper-enhanced microservice:
large-v3
- next-gen most accurate multilingual model.large-v2
- most accurate multilingual model. This is the default model.medium
- less accurate but faster thanlarge-v2
.base
- less accurate but faster thanmedium
.small
- less accurate but faster thanbase
.
- Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zipfile) when downloaded.
- Upload archive to virtual appliance.
$ scp licensed-models.zip root@<virtual-appliance-ip>:/data/
- Unzip archive. Models are extracted to directory per microservice:
$ unzip licensed-models.zip
- Content of the
/data/models
should look like:$ find /data/models
/data/models/
/data/models/speech_to_text_whisper_enhanced
/data/models/speech_to_text_whisper_enhanced/small-1.0.0.model
/data/models/speech_to_text_whisper_enhanced/speech_to_text_whisper_enhanced-base-1.0.0-license.key.txt
/data/models/speech_to_text_whisper_enhanced/base-1.0.0.model
/data/models/speech_to_text_whisper_enhanced/speech_to_text_whisper_enhanced-small-1.0.0-license.key.txt
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-whisper-enhanced.config.model
- Change content of the
file
key from"large_v2-1.0.0.model"
to file you've just uploaded ("small-1.0.0.model"
). - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-whisper-enhanced:
<Not significant lines omitted>
config:
model:
<Not significant lines omitted>
file: "small-1.0.0.model" - Change the license because you have changed the model. See above how to do it.
- Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Process pantented audio codecs with media-conversion
By default media conversion can work only with patent-free audio codecs.
We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.
Pull Media Conversion image on the fly
This is handy if you don't mind pulling images from the internet. Image is pulled only if it is not present yet.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.image
- Change content of the
repository
,registry
,tag
andtagSuffix
tomedia-conversion:
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Put Media Conversion image into virtual appliance
This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.
- [PC] Pull media-conversion image locally:
$ docker pull phonexia/media-conversion:1.0.0
- [PC] Save Media Conversion image to tar archive:
$ docker save --output images.tar phonexia/media-conversion:1.0.0
- [PC] Copy
images.tar
file into virtual appliance via ssh or filebrowser to/data/images
.scp images.tar root@<IP of virtual appliance>:/data/images
- [Virtual appliance] Restart virtual appliance to load the images or load them
manually with:
ctr image import /data/images/images.tar
- Reconfigure the speech-platform to use locally downloaded image as mentioned above.
Limits
This section describe what are virtual appliance limits and how to modify them.
API limits
Following limits are applied for the API itself.
Name | Unit | Default | Description |
---|---|---|---|
taskExpirationTime | seconds | 300 | Time when finished tasks are expired. API holds the information about finished tasks (both successfully finished and failed). These information are discarded after taskExpirationTime . Client usually polls on the task id. Client must retrieve the task status before it is expired. Maximum value is 3600 . |
taskGrpcTimeout | seconds | 120 | Maximum time API waits for any task to complete. If you process big audio files, you probably need to increase this limit. |
inputStorageSize | variable | 1GiB | Size of the input storage. When audio file is POSTed to the API, whole file must be stored on the disk. If you process big files or multiple files in parallel, then this limit must be probably increased. |
internalStorageSize | variable | 1GiB | Size of the internal storage. Each audiofile is converted into wav format before processing. Converted audio is stored on the disk. If you process big files or multiple files in parallel, then this limit must be probably increased. Also note the internalStorageSize must be greater or equal to the inputStorageSize . |
singleFileUploadTimeout | second | 120 | Maximum allowed time for uploading single file to the API. If you process big files or having a poor network connection, then this limit must be increased. |
singleFileUploadSize | bytes | 104857600 (== 100MB) | Maximum allowed size of an audio file to upload. If you process big files then this limit must be increased. Note that this API/ingress limit not the UI limit! |
How to change the API limits
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.api.config
- Change the value of the corresponding limit to a new value.
api:
config:
taskExpirationTime: 1200 - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
api:
<Not significant lines omitted>
config:
taskExpirationTime: 1200 - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
UI limits
Following limits are applied for the UI itself.
Name | Unit | Default | Description |
---|---|---|---|
taskParallelism | 4 | UI post task to the API and polls for the task until it is finished. This controls how many tasks can be processed in parallel. | |
taskPollingInterval | seconds | 1 | Duration between poll attempts. |
taskPollingTimeout | seconds | 3600 | How long the UI polls for the task. How long is the UI willing to wait until the task is finished. |
Speaker Identification UI limits
Limits in config section
.spec.valuesContent.frontend.config.limits.speakerIdentification
are
applicable only for speaker identification.
Name | Unit | Default | Description |
---|---|---|---|
maxFileSize | bytes | 5242880 (== 5MB) | Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the singleFileUploadSize API limit. |
maxFilesCount | 100 | Maximum number of files to be uploaded. | |
maxVoiceRecorderDuration | seconds | 300 | Maximum duration of the record captured by voice recorder. |
Speech to text UI limits
Limits in config section
.spec.valuesContent.frontend.config.limits.speechToText
are applicable only
for Speech to Text. Limits are applicable for both Whisper enhanced and Phonexia
6th Gen.
Name | Unit | Default | Description |
---|---|---|---|
maxFileSize | bytes | 5242880 (== 5MB) | Maximum allowed size of an audio file to upload. Note that this must be lower than or equal to the singleFileUploadSize API limit. |
maxFilesCount | 100 | Maximum number of files to be uploaded. | |
maxVoiceRecorderDuration | seconds | 300 | Maximum duration of the record captured by voice recorder. |
How to change the UI limits
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.frontend.config.limits
- Change the value of the corresponding limit to a new value.
frontend:
config:
limits:
taskParallelism: 2 - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
frontend:
<Not significant lines omitted>
config:
limits:
taskParallelism: 2 - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Admin console
Admin console is a simple web page containing links to various admin-related
tools. Console is located at http://<IP_of_virtual_appliance>/admin
. It
contains links to
- filebrowser
- prometheus
- grafana
Grafana
Grafana is tool for visualizing application and kubernetes metrics. List of most useful dashboards available in the grafana:
- Envoy Clusters - See envoy cluster statistics
- Kubernetes / Compute Resources / Pod - See resource consumption of individual pods
- NGINX Ingress controller - See ingress controller stats
- NVIDIA DCGM Exporter Dashboard - See GPU device stats
- Node Exporter / Nodes - See stats about virtual appliance
- Speech Platform API capacity - See metrics about speech platform itself
Troubleshooting
This section contains information about individual components of the speech platform and request flow
Speech platform components
List of the components:
- frontend - simple webserver serving static html, css, javascript and image files
- docs - simple webserver serving documentation
- assets - simple webserver hosting examples
- api - python component providing REST API interface
- envoy - router and loadbalancer for GRPC messages
- media-conversion - python component used for ** converting audio files from various formats to simple wav format ** splitting multi-channel audio into multiple single-channel files
- technology microservices ** speech-to-text-whisper-enhanced - transcribes speech to text ** speech-to-text-phonexia - transcribes speech to text ** voiceprint-extraction - extracts voiceprint from audio file ** voiceprint-comparison - compares multiple voiceprints
Request flow
- User POST request (for example transcribe speech to text) to API.
- API creates task for processing and output task id to the user.
- From this point user can poll on the task to get the result.
- API calls media-conversion via envoy.
- Media conversion converts the audiofile to wav format and possibly splits it into multiple mono-channel files.
- API gets converted audiofile from media-conversion.
- API calls speech-to-text-whisper-enhanced via envoy.
- Speech-to-text-whisper-enhanced transcribes the audiofile.
- API gets the transcription.
- User can retrieve the task result.
Upgrade guide
This section describes how to perform upgrade of virtual appliance.
- Import new version of virtual appliance into your virtualization platform
- Stop current version of virtual appliance
- Detach data disk from current version of virtual appliance
- Attach data disk to new version of virtual appliance
- Start new version of virtual appliance
- Delete old version of virtual appliance
Upgrade to 2.0.0
This section describes manual steps which need to be done prior upgrading to 2.0.0. There are various changes in the configuration which must be reflected before upgrade. We suggest to always use configuration file bundled with new version of the virtual appliance and update it to suit your needs (insert licenses, enable/disable service, set replicas, ...). If you are not willing to do this, then you must modify your current configuration file to work with new version of the virtual appliance.
Rename speech-engine subchart to speech-to-text-phonexia
Due to renaming speech-engine subchart you have to update speech platform values file before upgrading:
- Open the new text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-engine
. - Rename
speech-engine
tospeech-to-text-phonexia
. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted> - Save the file
Rename speech-to-text-phonexia instances
- Open the new text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Remove
stt-
prefix from the name of each instance. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ar-kw
imageTag: 3.60.1-stt-ar_kw_6
onDemand:
enabled: true
- name: ar-kx
imageTag: 3.60.1-stt-ar_xl_6
onDemand:
enabled: true - Save the file
Add proper tag suffix for Media Conversion
- Open the new text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.image
. - Change the value of the
tagSuffix
key tofree
. - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
image:
<Not significant lines omitted>
tagSuffix: "-free"
<Not significant lines omitted> - Save the file
- Proceed with upgrade
Update path to models
Default model location was changed from /data/models
to
/data/models/<microservice>
. If you plan to upgrade and keep current data
disk, no steps are needed. Model are loaded from old location which is
/data/models
. If you plan to upgrade from scratch (discarding the current data
disk), no steps are needed as well - models are loaded from new location which
is /data/models/<microservice>
.
How to modify OVF to Hyper-V compatible VM
- Both of existing virtual HDDs (.vmdk) need to be converted to Hyper-V compatible HDDs (.vhdx). Do it through this program: Starwind V2V Converter.
- Create new VM in Hyper-V.
- IMPORTANT: Use Generation 1 VM - Generation 2 doesn’t work.
- Enable networking/make sure it is enabled.
- OPTIONAL: Disable options like DVD drive or SCSI controller since they are not needed.
- Set Memory to at least 16GB and CPUs to at least 8 cores.
- Attach HDDs, preferably onto one IDE controller.
- Start the VM.
- After it starts, check IP address either printed out on a login screen. Wait for the entire engine to start.
- Go to the IP from the previous step and verify that the entire VM works as it should.