Deployment of Phonexia Virtual Appliance
The goal of this article is to guide you through the initial installation process of Virtual Appliance of Phonexia Speech Platform (SP4).
By the end of the article, you will be able to start processing your recordings with Phonexia Speech Technologies.
Prerequisites
We currently support only Oracle VirtualBox and VMWare hypervisors. Hyper-V is supported for CPU-based technologies, while GPU passthrough for Hyper-V is not tested - it might work on your end but is not guaranteed.
It will probably work on other virtualization platforms but we haven't tested it yet.
Evaluation HW requirements
- 60GB of disk space
- 4 CPU cores
- 32GB of memory
Evaluation HW requirements mean that you are able to process all technologies
for evaluation purposes. However, we recommend to disable all non-needed (not evaluated) technologies to save the resources.
GPU
GPU is not required to make virtual appliance work but you will suffer serious performance degradation for enhanced speech-to-text built on Whisper functionality.
If you decide to use GPU, then make sure that
- Server HW (especially BIOS) has support for IOMMU.
- Host OS can pass GPU device to virtualization platform (== Host OS can be configured to NOT use the GPU device)
- Virtualization platform can pass GPU device to guest OS.
Deployment of Virtual Appliance
Step 1: Download Required Files
Download the files provided by Phonexia:
- speech-platform-virtual-appliance.zip
- licensed-models.zip
Step 2: Import Virtual Appliance
Unzip speech-platform-virtual-appliance.zip
Import the unzipped file into your virtualization platform (e.g., VMware,
VirtualBox).
- For Hyper-V deployment: Refer to the section How to modify OVF to Hyper-V compatible VM
Once Virtual Appliance is imported it will start its deployment. You can see tasks being completed as it is starting in console. It takes approximately 2 minutes for Kubernetes pods to initialize, When Kubernetes is up and running you will see
Rocky Linux 9.5 (Blue Onyx)
Kernel 5.14.0-503.14.1.e19_5-x86_64 on an x86_64
Welcome to Phoenxia Speech Platform 3.6.0
After first start you need to provide a license and upload technology models, see instructions at
the GUI is accessible at
Bundeled documenatation is accessible at
Online documentation is accessible at
Note: Make sure you use corresponding version (3.6.0) in online documentation
speech-platform login:
login: root
password: InVoiceWeTrust
Step 3: Verify SSH Access
SSH server is deployed and enabled in virtual appliance. Use following credentials:
login: root
password: InVoiceWeTrust
We recommend to change the root
password and disable password authentication
via SSH for root
user in favor of key-based authentication.
Instead of root
user we recommend to use phonexia
user as we plan to disable
root
user login in future. Use sudo
command to switch to the root
user
after login.
Step 4: Upload Licensed Models
Virtual appliance is distributed without licenses and models. To get models and licenses, contact Phonexia support. They will provide a bundle (.zip file) with models and licenses. Bundle then needs to be uploaded and unzipped inside the virtual appliance.
We provide File Browser inside the virtual appliance for uploading files, which
is accessible at <IP_address_of_VA>/filebrowser
. Once inside the filebrowser
app, select upload in the top right corner and choose the bundle with models and
licenses (licensed-models.zip), a pop-up windows will show in the bottom right
corner with the upload progress. Filebrowser automatically unzips the bundle
once it is uploaded. The upload will show as finished after the bundle is
extracted. This automatic extraction works only for a bundle named
licensed-models.zip
; if you rename the bundle, the extraction will not work,
and you will need to do it manually. After the models are extracted, a speech
platform configuration script will enable and configure microservices based on
the uploaded models and licenses.
Alternative way to upload the bundle to virtual appliance using command line is as described below:
- Upload provided
licensed-models.zip
archive to virtual appliance via filebrowser or via scp:scp -P <virtual-appliance-port> licensed-models.zip root@<virtual-appliance-ip>:/data/
- Connect to the virtual appliance
/data
folder:ssh root@<virtual-appliance-ip> -p <virtual-appliance-port>
cd /data - Unzip archive. Models are extracted to directory per technology:
unzip licensed-models.zip
The bundle content has a specific structure that ensures all models and licenses are placed in the correct locations after unzipping.
Step 5: Verification of Functionality (optional)
Changes in configuration are not applied
Changes in the main configuration file
/data/speech-platform/speech-platform-values.yaml
are automatically picked up
and applied by the helm controller. If configuration is not valid (or to be more
precise - if the configuration file is not valid YAML file), the helm controller
fails to apply the configuration. The helm controller creates a one-time job to
update the helm chart with the new configuration. If the configuration is
incorrect, the job will not complete successfully, and the underlying pod will
either restart or be in an error state. The pod status will reflect this issue:
[root@speech-platform disks]# kubectl get pods -n kube-system | grep -i helm-install
helm-install-filebrowser-2b7pn 0/1 Completed 0 51m
helm-install-ingress-nginx-m87d4 0/1 Completed 0 51m
helm-install-nginx-nrcvk 0/1 Completed 0 51m
helm-install-dcgm-exporter-fjqzz 0/1 Completed 0 51m
helm-install-kube-prometheus-stack-jn5bz 0/1 Completed 0 51m
helm-install-keda-vsn95 0/1 Completed 0 51m
helm-install-speech-platform-9l9vj 0/1 Error 4 (46s ago) 6m15s
View logs of failed helm-install pod:
[root@speech-platform disks]# kubectl logs -f helm-install-speech-platform-9l9vj -n kube-system
...
...
...
Upgrading speech-platform
+ helm_v3 upgrade --namespace speech-platform speech-platform https://10.43.0.1:443/static/phonexia-charts/speech-platform-0.0.0-36638f5-helm.tgz --values /config/values-10_HelmChartConfig.yaml
Error: failed to parse /config/values-10_HelmChartConfig.yaml: error converting YAML to JSON: yaml: line 494: could not find expected ':'
Check configuration file validity
This section describes how to check if your configuration is valid and how to identify which line in the configuration is incorrect.
Use following command to check if the configuration file is valid:
yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
If the configuration file is valid, the content of the file will be printed. Otherwise, the line number with an error will be printed out as follows:
[root@speech-platform ~]# yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
Error: bad file '-': yaml: line 253: could not find expected ':'
Content of the file 10 lines before and 10 lines after line 253:
[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 253 -B 10 -A 10
243 # -- List of devices to use. GPU only.
244 # deviceIndices: [0,1]
245
246 # Uncomment this to force whisper to run on GPU
247 device: cuda
248
249 logLevel: debug
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value:
260 "eyJ2ZX...=="
261
262 # Uncomment this to grant access to GPU on whisper pod
263 resources:
There is nothing suspicious on the line 253. In fact, the line number reported
by yq
might be slightly off because the configuration of speech-platform helm
chart itself is stored as a value of the spec.valuesContent
key in the
speech-platform-values.yaml
file. Therefore, you need to add number 7
(sincespec.valuesContent
is on the 7th line in the configuration file) to the
error line number to get the correct line number (== 260):
[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value:
260 "eyJ2ZX...=="
261
262 # Uncomment this to grant access to GPU on whisper pod
263 resources:
264 limits:
265 nvidia.com/gpu: "1"
266
267 # Uncomment this to run whisper on GPU
268 runtimeClassName: "nvidia"
269
270 service:
There is only a license key on line 260. Error message
could not find expected ':'
which is right because there is no :
on this
line. One line above (259) there is a key named value
which should contain the
license. However, the license itself is on line 260, making this file invalid
(i.e., it is not in a valid YAML format). To fix it, simply merge lines 259
and 260. The resulting file should look like this:
[root@speech-platform ~]# cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value: "eyJ2ZX...=="
260
261 # Uncomment this to grant access to GPU on whisper pod
262 resources:
263 limits:
264 nvidia.com/gpu: "1"
265
266 # Uncomment this to run whisper on GPU
267 runtimeClassName: "nvidia"
268
269 service:
270 clusterIP: "None"
Final Step: Enable Technologies
Virtual appliance comes with all microservices disabled by default. You need to enable microservice if you plan to use it. You can enable microservice manually by editing the configuration file or automatically by a configuration script.
Enable microservices by a script
There is a script named configure-speech-platform.sh
which automatically
configure (enable/disable) all microservices you have license and model for.
- Connect to the virtual appliance:
$ ssh root@<virtual-appliance-ip> -p <virtual-appliance-port>
- Run the
configure-speech-platform.sh
script:$ /root/scripts/configure-speech-platform.sh --auto-configure
- All licensed microservices should be enabled now
- The application automatically recognizes when microservices are enabled and redeploys itself with the updated configuration.