Version: 3.7.0

Deployment of Phonexia Virtual Appliance

The goal of this article is to guide you through the initial installation process of Virtual Appliance of Phonexia Speech Platform (SP4).

By the end of the article, you will be able to start processing your recordings with Phonexia Speech Technologies.

Prerequisites

We currently support only Oracle VirtualBox and VMWare hypervisors. Hyper-V is supported for CPU-based technologies, while GPU passthrough for Hyper-V is not tested - it might work on your end but is not guaranteed.

It will probably work on other virtualization platforms but we haven't tested it yet.

Evaluation HW requirements

60GB of disk space
4 CPU cores
32GB of memory

Evaluation HW requirements mean that you are able to process all technologies

for evaluation purposes. However, we recommend to disable all non-needed (not evaluated) technologies to save the resources.

GPU

GPU is not required to make virtual appliance work but you will suffer serious performance degradation for enhanced speech-to-text built on Whisper functionality.

If you decide to use GPU, then make sure that

Server HW (especially BIOS) has support for IOMMU.
Host OS can pass GPU device to virtualization platform (== Host OS can be configured to NOT use the GPU device)
Virtualization platform can pass GPU device to guest OS.

Deployment of Virtual Appliance

Step 1: Download Required Files

Download the files provided by Phonexia:

speech-platform-virtual-appliance.zip
licensed-models.zip

Step 2: Import Virtual Appliance

Unzip speech-platform-virtual-appliance.zip
Import the unzipped file into your virtualization platform (e.g., VMware, VirtualBox).

For Hyper-V deployment: Refer to the section How to modify OVF to Hyper-V compatible VM

Once Virtual Appliance is imported it will start its deployment. You can see tasks being completed as it is starting in console. It takes approximately 2 minutes for Kubernetes pods to initialize, When Kubernetes is up and running you will see

Rocky Linux 9.5 (Blue Onyx)
Kernel 5.14.0-503.14.1.e19_5-x86_64 on an x86_64

Welcome to Phoenxia Speech Platform 3.6.0
After first start you need to provide a license and upload technology models, see instructions at
the GUI is accessible at
Bundeled documenatation is accessible at
Online documentation is accessible at
Note: Make sure you use corresponding version (3.6.0) in online documentation
speech-platform login:

login: root
password: InVoiceWeTrust

Step 3: Verify SSH Access

SSH server is deployed and enabled in virtual appliance. Use following credentials:

login: root
password: InVoiceWeTrust

We recommend to change the root password and disable password authentication via SSH for root user in favor of key-based authentication.

Instead of root user we recommend to use phonexia user as we plan to disable root user login in future. Use sudo command to switch to the root user after login.

Step 4: Upload Licensed Models

Virtual appliance is distributed without licenses and models. To get models and licenses, contact Phonexia support. They will provide a bundle (.zip file) with models and licenses. Bundle then needs to be uploaded and unzipped inside the virtual appliance.

We provide File Browser inside the virtual appliance for uploading files, which is accessible at <IP_address_of_VA>/filebrowser. Once inside the filebrowser app, select upload in the top right corner and choose the bundle with models and licenses (licensed-models.zip), a pop-up windows will show in the bottom right corner with the upload progress. Filebrowser automatically unzips the bundle once it is uploaded. The upload will show as finished after the bundle is extracted. This automatic extraction works only for a bundle named licensed-models.zip; if you rename the bundle, the extraction will not work, and you will need to do it manually. After the models are extracted, a speech platform configuration script will enable and configure microservices based on the uploaded models and licenses.

Alternative way to upload the bundle to virtual appliance using command line is as described below:

Upload provided licensed-models.zip archive to virtual appliance via filebrowser or via scp:
```
scp -P <virtual-appliance-port> licensed-models.zip root@<virtual-appliance-ip>:/data/
```

Connect to the virtual appliance /data folder:

ssh root@<virtual-appliance-ip> -p <virtual-appliance-port>
cd /data

Unzip archive. Models are extracted to directory per technology:
```
unzip licensed-models.zip
```

The bundle content has a specific structure that ensures all models and licenses are placed in the correct locations after unzipping.

Step 5: Verification of Functionality (optional)

Changes in configuration are not applied

Changes in the main configuration file /data/speech-platform/speech-platform-values.yaml are automatically picked up and applied by the helm controller. If configuration is not valid (or to be more precise - if the configuration file is not valid YAML file), the helm controller fails to apply the configuration. The helm controller creates a one-time job to update the helm chart with the new configuration. If the configuration is incorrect, the job will not complete successfully, and the underlying pod will either restart or be in an error state. The pod status will reflect this issue:

[root@speech-platform disks]# kubectl get pods -n kube-system | grep -i helm-install
helm-install-filebrowser-2b7pn                  0/1     Completed   0             51m
helm-install-ingress-nginx-m87d4                0/1     Completed   0             51m
helm-install-nginx-nrcvk                        0/1     Completed   0             51m
helm-install-dcgm-exporter-fjqzz                0/1     Completed   0             51m
helm-install-kube-prometheus-stack-jn5bz        0/1     Completed   0             51m
helm-install-keda-vsn95                         0/1     Completed   0             51m
helm-install-speech-platform-9l9vj              0/1     Error       4 (46s ago)   6m15s

View logs of failed helm-install pod:

[root@speech-platform disks]# kubectl logs -f helm-install-speech-platform-9l9vj -n kube-system
...
...
...
Upgrading speech-platform
+ helm_v3 upgrade --namespace speech-platform speech-platform https://10.43.0.1:443/static/phonexia-charts/speech-platform-0.0.0-36638f5-helm.tgz --values /config/values-10_HelmChartConfig.yaml
Error: failed to parse /config/values-10_HelmChartConfig.yaml: error converting YAML to JSON: yaml: line 494: could not find expected ':'

Check configuration file validity

This section describes how to check if your configuration is valid and how to identify which line in the configuration is incorrect.

Use following command to check if the configuration file is valid:

yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .

If the configuration file is valid, the content of the file will be printed. Otherwise, the line number with an error will be printed out as follows:

[root@speech-platform ~]# yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
Error: bad file '-': yaml: line 253: could not find expected ':'

Content of the file 10 lines before and 10 lines after line 253:

[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml  | grep 253 -B 10 -A 10
        # -- List of devices to use. GPU only.
        # deviceIndices: [0,1]
   245
        # Uncomment this to force whisper to run on GPU
        device: cuda
   248
        logLevel: debug
   250
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper
   255
          # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
          file: "large_v2-1.0.1.model"
        license:
          value:
          "eyJ2ZX...=="
   261
      # Uncomment this to grant access to GPU on whisper pod
      resources:

There is nothing suspicious on the line 253. In fact, the line number reported by yq might be slightly off because the configuration of speech-platform helm chart itself is stored as a value of the spec.valuesContent key in the speech-platform-values.yaml file. Therefore, you need to add number 7 (sincespec.valuesContent is on the 7th line in the configuration file) to the error line number to get the correct line number (== 260):

[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
   250
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper
   255
          # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
          file: "large_v2-1.0.1.model"
        license:
          value:
          "eyJ2ZX...=="
   261
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"
   266
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"
   269
      service:

There is only a license key on line 260. Error message could not find expected ':' which is right because there is no : on this line. One line above (259) there is a key named value which should contain the license. However, the license itself is on line 260, making this file invalid (i.e., it is not in a valid YAML format). To fix it, simply merge lines 259 and 260. The resulting file should look like this:

[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
   250
        model:
          volume:
            hostPath:
              path: /data/models/enhanced_speech_to_text_built_on_whisper
   255
          # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
          file: "large_v2-1.0.1.model"
        license:
          value: "eyJ2ZX...=="
   260
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"
   265
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"
   268
      service:
        clusterIP: "None"

Final Step: Enable Technologies

Virtual appliance comes with all microservices disabled by default. You need to enable microservice if you plan to use it. You can enable microservice manually by editing the configuration file or automatically by a configuration script.

Enable microservices by a script

There is a script named configure-speech-platform.sh which automatically configure (enable/disable) all microservices you have license and model for.

Connect to the virtual appliance:

$ ssh root@<virtual-appliance-ip> -p <virtual-appliance-port>

Run the configure-speech-platform.sh script:

$ /root/scripts/configure-speech-platform.sh --auto-configure

All licensed microservices should be enabled now
The application automatically recognizes when microservices are enabled and redeploys itself with the updated configuration.

Prerequisites​

Evaluation HW requirements​

GPU​

Deployment of Virtual Appliance​

Changes in configuration are not applied​

Check configuration file validity​

Enable microservices by a script​