Version: 1.0.0

Speech Platform Virtual Appliance

The Speech Platform Virtual Appliance is a distribution of the Phonexia Speech Platform in the form of a virtual image. Presently, it exclusively supports the OVF format.

Installation

This section describes how to install virtual appliance into you virtualization platform.

Prerequisities

Currently we support only virtualbox and VMWare.

It will probably works on other virtualization platforms but we haven't tested it yet.

Minimal HW requirements

50GB of disk space
4 CPU
16GB of memory

Minimal requirements means that you are able to single technology (speaker identification, speech-to-text by whisper or speech-to-text by phonexia) for evaluation purpose. We recommend to disable all non-needed (not evaluated) technologies to save the resources.

Resource usage per technology

Speaker identification - 1 CPU and 2GB memory
Speech-to-text by phonexia - 1CPU and 4GB memory per language
Speech-to-text by whisper - 8CPU and 8GB memory or 1CPU and 8GB memory and GPU card

Note: Running speech-to-text by whisper on CPU is slow. We recommend to use at least 8CPU to run our built-in examples in reasonable time.

GPU

GPU is not required to make virtual appliance work but you will suffer serious performance degradation for whisper speech-to-text functionality.

If you decide to use GPU, then make sure that

Server HW (especially BIOS) has support for IOMMU.
Host OS can pass GPU device to virtualization platform (== Host OS can be configured to NOT use the GPU device)
Virtualization platform can pass GPU device to guest OS.

Installation guide

Download virtual appliance
Import virtual appliance to your virtualization platform (For Hyper-V deployment, please refer to section 'How to modify OVF to Hyper-V compatible VM')
Run virtual appliance

Post-installation steps

Virtual appliance is configured to obtain IP address from DHCP server. If you are not using DHCP server of IP allocation or prefer to set up static IP, then you have to reconfigure the OS.

SSH server

SSH server is deployed and enabled in virtual appliance. Use following credentials:

login: root
password: InVoiceWeTrust

We recommend to change the root password and disable password authentication via SSH for root user in favor of key-based authentication.

Open ports

List of open ports:

SSH (22) - for convenient access to OS
HTTP (80) - Speech platform is accessible via HTTP protocol
HTTPS (443) - Speech platform is also accessible via HTTPS protocol
HTTPS (6443) - Kubernetes API
HTTPS (10250) - Metrics server

K3s check

K3s (kubernetes distribution) is started automatically by systemd when virtual appliance is started. You can verify whether k3s is running or not with this command:

systemctl status k3s

Kubernetes check

When k3s service is started, it takes some time until application (== kubernetes pods) is started. Usually it tooks about 2 minutes. To check if application is up and running, execute following command:

kubectl -n speech-platform get pods

When all pods are running, output looks like:

[root@speech-platform ~]# kubectl -n speech-platform get pods
NAME                                                              READY   STATUS             RESTARTS      AGE
speech-platform-docs-57dcd49f9f-q97w4                             1/1     Running            0             2m10s
speech-platform-envoy-759c9b49d9-99vp7                            1/1     Running            0             2m10s
speech-platform-frontend-7f4566dbc6-jhprh                         1/1     Running            0             2m10s
speech-platform-assets-5697b4c86-8sh9k                            1/1     Running            0             2m9s
speech-platform-media-conversion-7d8f884f9-zh75g                  1/1     Running            0             2m9s
speech-platform-api-69bc7d4d5b-6kv7x                              1/1     Running            0             2m9s
speech-platform-speech-to-text-whisper-enhanced-74548494c866mrz   0/1     CrashLoopBackOff   4 (29s ago)   2m10s
speech-platform-voiceprint-extraction-68d646d449-9br8m            0/1     CrashLoopBackOff   4 (33s ago)   2m10s
speech-platform-voiceprint-comparison-76948b4947-xjw92            0/1     CrashLoopBackOff   4 (15s ago)   2m10s

Voiceprint-extraction, voiceprint-comparision and speech-to-text-whisper-enhanced microservices (pods) are failing initially. This is expected and it is caused by missing license. You can either add license to microservices or disable them if you don't plan to use them.

Optionally you can check if all other system and auxiliary applications are running:

kubectl get pods -A

All pods should be running or completed, like this:

[root@speech-platform ~]# kubectl get pods -A
NAMESPACE         NAME                                                              READY   STATUS             RESTARTS        AGE
kube-system       local-path-provisioner-8d98546c4-9pq8p                            1/1     Running            0               6m44s
kube-system       coredns-94bcd45cb-rp6zx                                           1/1     Running            0               6m44s
kube-system       metrics-server-754ff994c9-pczpx                                   1/1     Running            0               6m44s
kube-system       svclb-ingress-nginx-controller-baed713a-nzwcc                     2/2     Running            0               5m24s
kube-system       helm-install-ingress-nginx-wpwk4                                  0/1     Completed          0               6m45s
kube-system       helm-install-filebrowser-fd569                                    0/1     Completed          0               6m45s
kube-system       helm-install-nginx-28rll                                          0/1     Completed          0               6m45s
kube-system       helm-install-speech-platform-7k6qf                                0/1     Completed          0               6m45s
ingress-nginx     ingress-nginx-controller-679f97c77d-rdssr                         1/1     Running            0               5m24s
nginx             nginx-6ddd78f789-f9lq2                                            1/1     Running            0               5m39s
filebrowser       filebrowser-7476f7c65c-rk9d5                                      1/1     Running            0               5m39s
gpu               nfd-58s4x                                                         2/2     Running            0               5m44s
speech-platform   speech-platform-docs-57dcd49f9f-q97w4                             1/1     Running            0               5m38s
speech-platform   speech-platform-envoy-759c9b49d9-99vp7                            1/1     Running            0               5m38s
speech-platform   speech-platform-frontend-7f4566dbc6-jhprh                         1/1     Running            0               5m38s
speech-platform   speech-platform-assets-5697b4c86-8sh9k                            1/1     Running            0               5m37s
speech-platform   speech-platform-media-conversion-7d8f884f9-zh75g                  1/1     Running            0               5m37s
speech-platform   speech-platform-api-69bc7d4d5b-6kv7x                              1/1     Running            0               5m37s
speech-platform   speech-platform-voiceprint-extraction-68d646d449-9br8m            0/1     CrashLoopBackOff   5 (2m33s ago)   5m38s
speech-platform   speech-platform-speech-to-text-whisper-enhanced-74548494c866mrz   0/1     CrashLoopBackOff   5 (2m32s ago)   5m38s
speech-platform   speech-platform-voiceprint-comparison-76948b4947-xjw92            0/1     CrashLoopBackOff   5 (2m20s ago)   5m38s

Application check

Access virtual appliance welcome page on virtual appliance IP address or hostname from your local computer. If you are able to access the welcome page, applications should work.

Components

This is the list of components virtual appliance is composed of.

Operating system

There is Rocky Linux 9.3 under the hood.

GPU support

Virtual appliance has all necessary prerequisities pre-baked to allow run GPU-powered workloads (especially speech-to-text-whisper-enhanced). This means that nvidia drivers and container toolkit are already installed.

Kubernetes

There is k3s kubernetes distribution deployed inside.

Ingress controller

We use ingress-nginx ingress controller. This component is serving as reverse proxy and loadbalancer.

Speech platform

This is the application for solving various voice-related problems like speaker identification, speech-to-text transcription and many more. Speech platform is accessible via web browser or API.

File Browser

File Browser is web-based file browser/editor used to work with data on data disk.

Prometheus

Prometheus is a tool for providing monitoring information about kubernetes components.

Grafana

Grafana is a tool for visualization of prometheus metrics.

Disks

Virtual appliance comes with system disk and data disk.

System disk

Operating system is installed on system disk. You should not modify system disk unless you know what you are doing.

List of component stored on system disk:

Nvidia drivers
Container images for microservices
Packaged helm charts

Data disk

Data disk is used as persistent storage. Unlike system disk, data disk is intented to contain files which can be viewed/modified by the user. Data disk is created with PHXDATADISK label and system is instructed to mount filesystem with this label to /data directory.

List of components stored on data disk:

Logs (/data/logs) of the system, k3s and individual containers
Configuration for ingress controller (/data/ingress-nginx/ingress-nginx-values.yaml)
Configuration for speech platform (/data/speech-platform/speech-platform-values.yaml)
Models for individual microservices (/data/models/)

Configuration

Following section describe various configuration use cases.

Insert license keys

Virtual appliance is distributed without license. Speech platform does not work without valid license. If you haven't received any licence, please contact phonexia support. License must be inserted into each microservice.

Insert license into speech-to-text-whisper-enhanced microservice

Get license for speech-to-text model. License looks like eyJ2...In0=.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-whisper-enhanced.config.license.value
Change content of the value key from "<put your license for speech-to-text-by-whisper model here>" to license key.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        license:
          value: "eyJ2...In0="

Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Insert license into voiceprint-extraction microservice

Get license for speaker-identification model. License looks like eyJ2...In0=.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.voiceprint-extraction.config.license.value
Change content of the value key from "<put your license for speaker-identification model here>" to license key.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-extraction:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        license:
          value: "eyJ2...In0="

Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Insert license into voiceprint-comparison microservice

Get license for speaker-identification model. License looks like eyJ2...In0=.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.voiceprint-comparison.config.license.value
Change content of the value key from "<put your license for speaker-identification model here>" to license key.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    voiceprint-comparison:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        license:
          value: "eyJ2...In0="

Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Insert license into speech-engine-stt microservice

Get license for speech-egine stt. License looks like:

SERVER license.phonexia.com/lic
USE_TIME

PRODUCT SPE_v3 ACB46...
2uJ...M9A==
PRODUCT STT-tech F23B6...
jXu...K7A=

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-engine.config.license.value
Change content of the value key from "<put your license for speech-engine here>" to the license key.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-engine:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        license:
          value: |
            SERVER license.phonexia.com/lic
            USE_TIME

            PRODUCT SPE_v3 ACB46...
            2uJ...M9A==
            PRODUCT STT-tech F23B6...
            jXu...K7A=

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Set DNS name for speech platform virtual appliance

Speech platform is accessible on http://<IP_address_of_virtual_appliance>. We recommend to create DNS record to make access more comfortable for users. Consult your DNS provider to get more information how to add corresponding DNS record.

Use HTTPS certificate

Speech platform is also accessible via HTTPS protocol on https://<IP_address_of_virtual_appliance>. If you prefer secure communication you might need to use your own TLS certificate for securing the communication.

To do so, follow this guide:

Prepare the TLS certificate beforehand.
Put certificate private key in file named cert.key.
Put certificate into file named cert.crt.

Create kubernetes secret manifest storing the certificate and private key:

kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml

Copy manifest (resulting file) to /data/ingress-nginx/certificate-server.yaml.
Open text file /data/ingress-nginx/ingress-nginx-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.controller.extraArgs.default-ssl-certificate
Uncomment the line.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
    <Not significant lines omitted>
      extraArgs:
      <Not significant lines omitted>
        default-ssl-certificate: "ingress-nginx/default-ssl-certificate"

Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Disable unneeded microservices

Virtual appliance comes with all microservices enabled by default. You may decide to disable microservice if you do not plan to use it. Disabled microservice does not consume any compute resources.

Find out which microservices you want to disable - voiceprint-extraction, voiceprint-comparison, speech-to-text-whisper-enhanced or speech-engine-stt.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.<microservice>.enabled
Change the value from true to false.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
      enabled: false

Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Configure languages in speech-engine stt microservice

This microservice consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.

Note: Docker images for any language are not included in the virtual appliance. This means that virtual appliance needs to access the internet to download the docker image when speech-engine STT microservice is used!

By default all languages/instances are enabled. List of languages:

ar_kw_6
ar_xl_6
bn_6
cs_cz_6
de_de_6
en_us_6
es_6
fa_6
fr_fr_6
hr_hr_6
hu_hu_6
it_it_6
ka_ge_6
kk_kz_6
nl_6
pl_pl_6
ps_6
ru_ru_6
sk_sk_6
sr_rs_6
sv_se_6
tr_tr_6
uk_ua_6
vi_vn_6
zh_cn_6

How to disable all language instances except of cs_cz_6 and en_us_6:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-engine.config.instances.

Corresponsing section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-engine:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
     instances:
       - name: stt-ark
         imageTag: 3.59.0-stt-ar_kw_6
         onDemand:
           enabled: true
       - name: stt-arx
         imageTag: 3.59.0-stt-ar_xl_6
         onDemand:
           enabled: true
       - name: stt-bn
         imageTag: 3.59.0-stt-bn_6
         onDemand:
           enabled: true
       - name: stt-cs
         imageTag: 3.59.0-stt-cs_cz_6
         onDemand:
           enabled: true
       - name: stt-de
         imageTag: 3.59.0-stt-de_de_6
         onDemand:
           enabled: true
       - name: stt-en
         imageTag: 3.59.0-stt-en_us_6
         onDemand:
           enabled: true
         .
         .
         .
       - name: stt-vi
         imageTag: 3.59.0-stt-vi_vn_6
         onDemand:
           enabled: true
       - name: stt-zh
         imageTag: 3.59.0-stt-zh_cn_6
         onDemand:
           enabled: true

Comment out all the instance except (cs_cz_6 and en_us_6).

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-engine:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
     instances:
       #- name: stt-ark
       #  imageTag: 3.59.0-stt-ar_kw_6
       #  onDemand:
       #    enabled: true
       #- name: stt-arx
       #  imageTag: 3.59.0-stt-ar_xl_6
       #  onDemand:
       #    enabled: true
       #- name: stt-bn
       #  imageTag: 3.59.0-stt-bn_6
       #  onDemand:
       #    enabled: true
       - name: stt-cs
         imageTag: 3.59.0-stt-cs_cz_6
         onDemand:
           enabled: true
       #- name: stt-de
       #  imageTag: 3.59.0-stt-de_de_6
       #  onDemand:
       #    enabled: true
       - name: stt-en
         imageTag: 3.59.0-stt-en_us_6
         onDemand:
           enabled: true
         .
         .
         .
       #- name: stt-vi
       #  imageTag: 3.59.0-stt-vi_vn_6
       #  onDemand:
       #    enabled: true
       #- name: stt-zh
       #  imageTag: 3.59.0-stt-zh_cn_6
       #  onDemand:
       #    enabled: true

Or you can even delete the instances you are not interested in.

Then updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-engine:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
     instances:
       - name: stt-cs
         imageTag: 3.59.0-stt-cs_cz_6
         onDemand:
           enabled: true
       - name: stt-en
         imageTag: 3.59.0-stt-en_us_6
         onDemand:
           enabled: true

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Permanent vs onDemand instances

Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.

All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-engine.config.instances.

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-engine:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
     instances:
         .
         .
         .
       - name: stt-cs
         imageTag: 3.59.0-stt-cs_cz_6
         onDemand:
           enabled: true
         .
         .
         .

Delete onDemand key and its subkeys.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-engine:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
     instances:
         .
         .
         .
       - name: stt-cs
         imageTag: 3.59.0-stt-cs_cz_6
         .
         .
         .

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Modify microservice replicas

Each microservice has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel to have to increase replicas for corresponding microservices.

Note: We do not recommend increasing replicas for any microservice when virtual appliance is running with default resources (4CPU, 16GB memory)!

Find out which microservices you want to modify replicas - voiceprint-extraction, voiceprint-comparison and speech-to-text-whisper-enhanced.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.<microservice>.replicaCount
Change the value to desired amount of replicas.

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
      replicaCount: 2

Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Run speech-to-text-whisper-enhanced microservice on GPU

At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi to list all the devices. If device is present and visible to the system, then output should look like:

[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, then you can reconfigure the speech-to-text-whisper-enhanced to use GPU for the processing.

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate speech-to-text-whisper-enhanced section .spec.valuesContent.speech-to-text-whisper-enhanced.
Locate key .spec.valuesContent.speech-to-text-whisper-enhanced.config.device.

Uncomment the line so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
       # Uncomment this to force whisper to run on GPU
       device: cuda

Locate key .spec.valuesContent.speech-to-text-whisper-enhanced.resources.

Request GPU resources for the processing so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"

Locate key .spec.valuesContent.speech-to-text-whisper-enhanced.runtimeClassName.

Set runtimeClassName so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"

Locate key .spec.valuesContent.speech-to-text-whisper-enhanced.updateStrategy.

Set type to Recreate to allow seemless updates so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      # Uncomment this to allow seemless updates on single GPU machine
      updateStrategy:
        type: Recreate

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        device: cuda

      <Not significant lines omitted>
      resources:
        limits:
          nvidia.com/gpu: "1"

      <Not significant lines omitted>
      runtimeClassName: "nvidia"

      <Not significant lines omitted>
      updateStrategy:
        type: Recreate

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Change model used in a microservice

Each microservice needs a model to properly do its job. We provide more models for some microservices, for example speech-to-text-whisper-enhanced. Usually we pre-configure microservices with most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.

License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.

Change model in speech-to-text-whisper-enhanced microservice

We offer following models for speech-to-text-whisper-enhanced microservice:

large-v3 - next-gen most accurate multilingual model.
large-v2 - most accurate multilingual model. This is the default model.
medium - less accurate but faster than large-v2.
base - less accurate but faster than medium.
small - less accurate but faster than base.

Ask phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zipfile) when downloaded.

Unzip archive.

$ unzip license-speech_to_text_whisper_enhanced-medium-1.0.0-rc5.zip

Upload new model into virtual appliance data disk.

$ scp license-speech_to_text_whisper_enhanced-medium-1.0.0-rc5/speech_to_text_whisper_enhanced.model root@<virtual-appliance-ip>:/data/models/speech_to_text_whisper_enhanced-medium.model

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-whisper-enhanced.config.model
Change content of the file key from "speech_to_text_whisper_enhanced-large_v2-1.0.0-rc5.model" to file you've just uploaded ("speech_to_text_whisper_enhanced-medium.model").

Updated file should looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-whisper-enhanced:
    <Not significant lines omitted>
      config:
        model:
          <Not significant lines omitted>
          file: "speech_to_text_whisper_enhanced-medium.model"

Change the license because you have changed the model. See above how to do it.
Save the file
Application automatically recognize that file was updated and redeploys itself with updated configuration.

Admin console

Admin console is simple web page containing links to various admin-related tools. Console is located at http://<IP_of_virtual_appliance>/admin. It contains links to

filebrowser
prometheus
grafana

Grafana

Grafana is tool for visualizing application and kubernetes metrics. List of most useful dashboards available in the grafana:

Envoy Clusters - See envoy cluster statistics
Kubernetes / Compute Resources / Pod - See resource consumption of individual pods
NGINX Ingress controller - See ingress controller stats
NVIDIA DCGM Exporter Dashboard - See GPU device stats
Node Exporter / Nodes - See stats about virtual appliance
Speech Platform API capacity - See metrics about speech platform itself

Troubleshooting

This section contains information about individual components of the speech platform and request flow

Speech platform components

List of the components:

frontend - simple webserver serving static html, css, javascript and image files
docs - simple webserver serving documentation
assets - simple webserver hosting examples
api - python component providing REST API interface
envoy - router and loadbalancer for GRPC messages
media-conversion - python component used for ** converting audio files from various formats to simple wav format ** splitting multi-channel audio into multiple single-channel files
technology microservices ** speech-to-text-whisper-enhanced - transcribes speech to text ** speech-to-text-phonexia - transcribes speech to text ** voiceprint-extraction - extracts voiceprint from audio file ** voiceprint-comparison - compare multiple voiceprints

Request flow

User POST request (for example transcribe speech to text) to API.
API creates task for processing and output task id to the user.
From this point user can poll on the task to get the result.
API calls media-conversion via envoy.
Media conversion converts the audiofile to wav format and possibly split it into multiple mono-channel files.
API gets converted audiofile from media-conversion.
API calls speech-to-text-whisper-enhanced via envoy.
Speech-to-text-whisper-enhanced transcribe the audiofile.
API gets the transcription.
User can retrieve the task result.

Upgrade guide

This section describes how to perform upgrade of virtual appliance.

Import new version of virtual appliance into your virtualization platform
Stop current version of virtual appliance
Detach data disk from current version of virtual appliance
Attach data disk to new version of virtual appliance
Start new version virtual appliance
Delete old version of virtual appliance

How to modify OVF to Hyper-V compatible VM

Both of existing virtual HDDs (.vmdk) need to be converted to Hyper-V compatible HDDs (.vhdx). Do it through this program: Starwind V2V Converter.
Create new VM in Hyper-V.
IMPORTANT: Use Generation 1 VM - Generation 2 doesn’t work.
Enable networking/make sure it is enabled.
OPTIONAL: Disable options like DVD drive or SCSI controller since they are not needed.
Set Memory to at least 16GB and CPUs to at least 8 cores.
Attach HDDs, preferably onto one IDE controller.
Start the VM.
After it starts, check IP address either printed out on a login screen. Wait for the entire engine to start.
Go to the IP from the previous step and verify that the entire VM works as it should.

Installation​

Prerequisities​

Minimal HW requirements​

Resource usage per technology​

GPU​

Installation guide​

Post-installation steps​

SSH server​

Open ports​

K3s check​

Kubernetes check​

Application check​

Components​

Operating system​

GPU support​

Kubernetes​

Ingress controller​

Speech platform​

File Browser​

Prometheus​

Grafana​

Disks​

System disk​

Data disk​

Configuration​

Insert license keys​

Insert license into speech-to-text-whisper-enhanced microservice​

Insert license into voiceprint-extraction microservice​

Insert license into voiceprint-comparison microservice​

Insert license into speech-engine-stt microservice​

Set DNS name for speech platform virtual appliance​

Use HTTPS certificate​

Disable unneeded microservices​

Configure languages in speech-engine stt microservice​

Permanent vs onDemand instances​

Modify microservice replicas​

Run speech-to-text-whisper-enhanced microservice on GPU​

Change model used in a microservice​

Change model in speech-to-text-whisper-enhanced microservice​

Admin console​

Grafana​

Troubleshooting​

Speech platform components​

Request flow​

Upgrade guide​

How to modify OVF to Hyper-V compatible VM​

Installation

Prerequisities

Minimal HW requirements

Resource usage per technology

GPU

Installation guide

Post-installation steps

SSH server

Open ports

K3s check

Kubernetes check

Application check

Components

Operating system

GPU support

Kubernetes

Ingress controller

Speech platform

File Browser

Prometheus

Grafana

Disks

System disk

Data disk

Configuration

Insert license keys

Insert license into speech-to-text-whisper-enhanced microservice

Insert license into voiceprint-extraction microservice

Insert license into voiceprint-comparison microservice

Insert license into speech-engine-stt microservice

Set DNS name for speech platform virtual appliance

Use HTTPS certificate

Disable unneeded microservices

Configure languages in speech-engine stt microservice

Permanent vs onDemand instances

Modify microservice replicas

Run speech-to-text-whisper-enhanced microservice on GPU

Change model used in a microservice

Change model in speech-to-text-whisper-enhanced microservice

Admin console

Grafana

Troubleshooting

Speech platform components

Request flow

Upgrade guide

How to modify OVF to Hyper-V compatible VM