Adjustments
Following section describes various configuration use cases.
VirtualBox configuration
Linux deployment
If you use VirtualBox to run Virtual Appliance on Linux distributions, you can use our installation script to import and configure Virtual Appliance. To obtain this script, contact Phonexia support. They will provide you with this script. To use it, you must have already downloaded a bundle with the files of the Virtual Appliance itself and a bundle with models and licenses. To run this script, the following steps need to be done:
- Open terminal and locate the script.
- Make the script an executable:
chmod +x SpeechPlatformInstaller.sh
- Run the script using the following command:
SpeechPlatformInstaller.sh -m /path/to/models_bundle -v /path/to/VA_bundle -n virtual_machine_name
- Wait until the script finishes. When it does, it displays a link to the Speech Platform application.
Windows deployment
If you use VirutalBox on Windows you can use our installer, which will import and configure Virtual Appliance for you. To use this application you will need an archive with Virtual Appliance and an archive with licensed models. When you have these archives, run the app and fill out name of the virtual machine, as hypervisor select VirtualBox, select paths to archives and click install. New window will pop up and show you installation process. After the installation is complete you can access speech platform at http://localhost:1080/app/home
Hyper-V configuration
Supported Hyper-V versions
If you use Hyper-V as your hypervisor we provide configuration files for importing Virtual Appliance. There are however few prerequisites. Firstly you need to check which Hyper-V versions your system supports. You can do this by opening PowerShell as admin and running the following command:
Get-VMHostSupportedVersion
Now you can see which versions of Hyper-V are supported by your system. We ship
configuration file for version 8.0
.
Automatic configuration
If you are using Hyper-V as your hypervisor and use Windows 11, you can use our installer to import and configure Virtual Appliance for you. If you want to use this installer, you need to have Microsoft Virtual Machine Converter installed on your machine, you can download it from here. It is used for converting the virtual hard disks from vmdk format to Microsoft's VHDX format. You will also need to have archives with virtual appliance and with licensed models. After you obtain these, you can launch the installer, enter VM name, select Hyper-V as hypervisor and fill out paths to archives. Then click install and new window will pop up showing you the progress of installation. After it finishes you can access the speech platform on http://localhost:1080/app/home.
Manual configuration
If you want to configure VA manually please follow the steps described below.
Disk conversion
Next you need to convert provided virtual appliance disks from .vmdk
to
.vhdx
format. You can use something like Starwind V2V or Microsoft Virtual
Machine Converter. With Microsoft Virtual Machine Converter after installing it
open PowerShell as admin and type in the following commands:
- Import the PowerShell module
Import-Module 'C:\Program Files\Microsoft Virtual Machine Converter\MvmcCmdlet.psd1'
- Convert the disks
ConvertTo-MvmcVirtualHardDisk -SourceLiteralPath <path/to/vmdk> -VhdType DynamicHardDisk -VhdFormat vhdx -DestinationLiteralPath <path/to/target/folder>
Once the disks are converted you need to achieve proper folder structure for Hyper-V. Do this by moving those converted Virtual Hard Disks to their folder as shown below.
speech-platform-virtual-appliance
├── Virtual Hard Disks
| ├── speech-platform-disk0001.vhdx
| └── speech-platform-disk0002.vhdx
└── Virtual Machines
├── <MachineID>.vmcx
└── <MachineID>.vmrs
Networking configuration
Next step is configuring the networking for Virtual Appliance. To set this you need to create a Network Address Translation, a Hyper-V virtual switch and set up port forwarding.
- Create Virtual Switch
New-VMSwitch -Name "SpeechPlatformSwitch" -SwitchType Internal
- Add Switch Address
New-NetIPAddress -IPAddress 192.168.100.1 -PrefixLength 24 -InterfaceAlias "vEthernet (SpeechPlatformSwitch)"
- Create NAT
New-NetNAT -Name "SpeechPlatformNAT" -InternalIPInterfaceAddressPrefix 192.168.100.0/24
- Set up NAT port forwarding
Add-NetNatStatickMapping -NatName "SpeechPlatformNAT" -Protocol TCP -ExternalIPAddress 0.0.0.0 -ExternalPort 1080 -InternalIPAddress 192.168.100.2 -InternalPort 80
Add-NetNatStatickMapping -NatName "SpeechPlatformNAT" -Protocol TCP -ExternalIPAddress 0.0.0.0 -ExternalPort 2222 -InternalIPAddress 192.168.100.2 -InternalPort 22
Importing Virtual Appliance
When you are done with configuring the networking you can use the Hyper-V Manager UI to import Virtual Appliance. The virtual switch will be automatically detected and attached to the VA. As a last step you will need to set static IP address in Virtual appliance. There are two ways to configure this.
-
The first and simplest way is using a cloud-init configuration. In the directory that contains the configuration files for Hyper-V, there is also the file
seed.iso
. When you open VM settings in Hyper-V selectIDE Controller 0
and add DVD Drive. Select the provided image file and selectApply
. Now that you have attached this iso image cloud-init will automatically detect it and set its IP address with the same configuration as you would with the previous configuration. -
Second is starting Virtual Appliance and connecting to it using the Hyper-V Virtual Machine Connection. After logging in run following commands:
nmcli con add type ethernet con-name eth0 ifname eth0 ipv4.addresses 192.168.100.2/24 ipv4.gateway 192.168.100.1 ipv4.dns "8.8.8.8 8.8.4.4" ipv4.method manual
nmcli con up eth0After executing these commands reboot Virtual Appliance.
How to modify OVF to Hyper-V compatible VM
- Both of existing virtual HDDs (.vmdk) need to be converted to Hyper-V compatible HDDs (.vhdx). Do it through this program: Starwind V2V Converter.
- Create new VM in Hyper-V.
- IMPORTANT: Use Generation 1 VM - Generation 2 doesn't work.
- Enable networking/make sure it is enabled.
- OPTIONAL: Disable options like DVD drive or SCSI controller since they are not needed.
- Set Memory to at least 32GB and CPUs to at least 8 cores.
- Attach HDDs, preferably onto one IDE controller.
- Start the VM.
- After it starts, check IP address either printed out on a login screen. Wait for the entire engine to start.
- Go to the IP from the previous step and verify that the entire VM works as it should.
Cofiguration of microservices models and licenses
Changing microservice models
In case you use models other than the default, you need to change the values of
paths in /data/speech-platform/speech-platform-licenses.yaml
file:
<microservice>.config.model.file
value leading to model, and<microservice>.config.license.key
value leading to the license for used model.
Example (change model large_v2-1.0.1
to small-1.0.1
for
enhanced-speech-to-text-built-on-whisper
microservice):
enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "large_v2-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "large_v2-1.0.1"
needs to be changed to:
enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "small-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "small-1.0.1"
These changes are required for all microservices with licensed models except
speech-to-text-phonexia
, time-analysis
and audio-quality-estimation
.
Inspect microservices models
Models are stored inside the /data/models
folder, where path to each model is
constructed as:
/data/models/<technology_name>/<model_name>-<model_version>.model
Where:
- technology_name - is name of the technology, e.g.
speaker_identification
- model_name - is name of the model, e.g.
xl
- model_version - is version of a model, e.g.
5.0.0
Imported models can be inspected after the uploading (Step 4 of Installation Guide) by following command:
- Content of the
/data/models
:$ find /data/models
/data/models/
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/xl-5.0.0-license.txt
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1.model
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt
/data/models/speech_to_text_phonexia
/data/models/speech_to_text_phonexia/en_us_6-3.62.0-license.txt
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
/data/models/time_analysis
/data/models/time_analysis/generic-3.62.0-license.txt
/data/models/time_analysis/generic-3.62.0.model
Inspect microservices licenses
Licenses are stored in path
/data/speech-platform/speech-platform-licenses.yaml
. File contains the
Kubernetes secrets definition of a licenses which ensures the simple loading of
licenses to the application.
Imported licenses can be inspected after the uploading (Step 4 of Installation Guide) by following command:
- Content of the
/data/speech-platform
folder:$ find /data/speech-platform/
/data/speech-platform/
/data/speech-platform/speech-platform-licenses.yaml
/data/speech-platform/speech-platform-values.yaml
Kubernetes secret definitions in a file are separated by ---
. Each secret
contains the contents of the file on the .stringData.license
path
corresponding to the technology for which the license is meant. For example:
- For model of technology
speaker_identification
with namexl
and version5.0.0
, the secret will look like this:
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
Content of a license file
(/data/speech-platform/speech-platform-licenses.yaml
) can be shown by
following command:
- Content of the license file:
$ cat /data/speech-platform/speech-platform-licenses.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
name: enhanced-speech-to-text-built-on-whisper-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt" file>
type: Opaque
.
.
.
Set DNS name for speech platform virtual appliance
Speech platform is accessible on http://<IP_address_of_virtual_appliance>
. We
recommend to create DNS record to make access more comfortable for users.
Consult your DNS provider to get more information how to add corresponding DNS
record.
Use HTTPS certificate
Speech platform is also accessible via HTTPS protocol on
https://<IP_address_of_virtual_appliance>
. If you prefer secure communication
you might need to use your own TLS certificate for securing the communication.
To do so, follow this guide:
- Prepare the TLS certificate beforehand.
- Put certificate private key in file named
cert.key
. - Put certificate into file named
cert.crt
. - Create kubernetes secret manifest storing the certificate and private key:
kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml
- Copy manifest (resulting file) to
/data/ingress-nginx/certificate-server.yaml
. - Open text file
/data/ingress-nginx/ingress-nginx-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.controller.extraArgs.default-ssl-certificate
- Uncomment the line.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
<Not significant lines omitted>
extraArgs:
<Not significant lines omitted>
default-ssl-certificate: "ingress-nginx/default-ssl-certificate" - Save the file
- Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Extend disks
Disks are extended automatically on VM startup by growfs
systemd service when
you extend the backing volume/disk in the hypervisor. You can trigger the
extension manually by running the script
/root/grow-partition-and-filesystems.sh
. It grows partition and filesystem for
both system and data disks.
Phonexia speech to text microservice
This section describes configuration specific to phonexia speech to text microservice.
Permanent vs onDemand instances
Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.
All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
.
.
. - Delete onDemand key and its subkeys.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
.
.
. - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Configure languages in speech-to-text-phonexia microservice
This microservice consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.
By default all languages/instances are enabled in on-demand mode. List of languages:
- ar_kw_6
- ar_xl_6
- bn_6
- cs_cz_6
- de_de_6
- en_us_6
- es_6
- fa_6
- fr_fr_6
- hr_hr_6
- hu_hu_6
- it_it_6
- ka_ge_6
- kk_kz_6
- nl_6
- pl_pl_6
- ps_6
- ru_ru_6
- sk_sk_6
- sr_rs_6
- sv_se_6
- tr_tr_6
- uk_ua_6
- vi_vn_6
- zh_cn_6
How to disable all language instances except of cs_cz_6 and en_us_6:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ark
imageTag: 3.62.0-stt-ar_kw_6
onDemand:
enabled: true
- name: arx
imageTag: 3.62.0-stt-ar_xl_6
onDemand:
enabled: true
- name: bn
imageTag: 3.62.0-stt-bn_6
onDemand:
enabled: true
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
- name: de
imageTag: 3.62.0-stt-de_de_6
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
- name: vi
imageTag: 3.62.0-stt-vi_vn_6
onDemand:
enabled: true
- name: zh
imageTag: 3.62.0-stt-zh_cn_6
onDemand:
enabled: true - Comment out all the instances except (cs_cz_6 and en_us_6).
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
#- name: ark
# imageTag: 3.62.0-stt-ar_kw_6
# onDemand:
# enabled: true
#- name: arx
# imageTag: 3.62.0-stt-ar_xl_6
# onDemand:
# enabled: true
#- name: bn
# imageTag: 3.62.0-stt-bn_6
# onDemand:
# enabled: true
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
#- name: de
# imageTag: 3.62.0-stt-de_de_6
# onDemand:
# enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
#- name: vi
# imageTag: 3.62.0-stt-vi_vn_6
# onDemand:
# enabled: true
#- name: zh
# imageTag: 3.62.0-stt-zh_cn_6
# onDemand:
# enabled: true - Or you can even delete the instances you are not interested in.
- Then updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Modify replicas for permanent language instances
Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.
virtual appliance is running with default resources (4CPU, 32GB memory)! Note: OnDemand instance has always only one replica. :::
- Find out which language instance you want to configure replicas for.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount
. - Change the value to desired amount of replicas.
- Updated file should look like:
- Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
replicaCount: 2 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Modify parallelism for instances
Each instance is able to process only one request at the time, unless the
parallelism is overridden. Value of parallelism means the maximum number of
requests processed by one instance. Parallelism is set globally for all
instances of technology, however each instance can override the value. To
override parallelism for speech-to-text-phonexia
, time-analysis
, or
audio-quality-estimation
these steps needs to be followed:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Find the key, depending on technology (
speech-to-text-phonexia
,time-analysis
,audio-quality-estimation
) for which parallelism should be overridden:.spec.valuesContent.<technology>.parallelism
- Change the value to desired number of requests processed in parallel
- Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
# Global value of parallelism for all instances
parallelism: 2
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0
- name: en
imageTag: 3.62.0
# Override of parallelism for en instance
parallelism: 4 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Modify microservice replicas
Each microservice has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding microservices.
virtual appliance is running with default resources (4CPU, 32GB memory)! :::
- Find out which microservices you want to modify replicas -
audio-quality-estimation
,deepfake-detection
,emotion-recognition
,enhanced-speech-to-text-built-on-whisper
,language-identification
,speaker-diarization
,voice-activity-detection
,voiceprint-comparison
orvoiceprint-extraction
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.<microservice>.replicaCount
- Change the value to desired amount of replicas.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
replicaCount: 2 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Run microservice on GPU
Some of the microservices can run on GPU which increase the processing speed.
Microservices that can run on GPU are deepfake-detection
,
emotion-recognition
, enhanced-speech-to-text-built-on-whisper
,
gender-identification
, language-identification
, speaker-diarization
,
voice-activity-detection
, and voiceprint-extraction
.
At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi
to list all the devices. If device is present and visible to the system, then
output should look like:
[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the microservice to use GPU for the processing.
-
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. -
Locate microservice section
.spec.valuesContent.<microservice>
. -
Locate key
.spec.valuesContent.<microservice>.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force microservice to run on GPU
device: cuda -
Locate key
.spec.valuesContent.<microservice>.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key
.spec.valuesContent.<microservice>.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
# Uncomment this to run whisper on GPU
runtimeClassName: "nvidia" -
Locate key
.spec.valuesContent.<microservice>.updateStrategy
. -
Set
type
toRecreate
to allow seamless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<microservice>:
<Not significant lines omitted>
# Uncomment this to allow seamless updates on single GPU machine
updateStrategy:
type: Recreate -
Example: Updated file for
enhanced-speech-to-text-built-on-whisper
should look like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
-
Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
GPU parallelism settings
This section describes how to control processing parallelism when microservice
is running on GPU. Following configuration applies only to microservices
enhanced-speech-to-text-built-on-whisper
, language-identification
,
voice-activity-detection
, voiceprint-extraction
, deepfake-detection
and
gender-identification
:
<microservice>
config:
# -- Parallel tasks per device. GPU only.
instancesPerDevice: 1
# -- Index of device to use. GPU only.
#deviceIndex: 0
There are two configuration options:
instancesPerDevice
- Controls how many tasks can be processed by a microservice on single GPU in parallel. Higher value means higher GPU utilization (both processor- and memory-wise).deviceIndex
- Controls which GPU card to use in case there are multiple GPU cards. We usually discourage to use this in most cases.
Change model used in a microservice
Each microservice needs a model to do its job properly. We provide more models
for some microservices, for example enhanced-speech-to-text-built-on-whisper
.
Usually we pre-configure microservices with the most accurate (and slowest
model). Typically users use different model to speed up processing in favor of
less accurate results.
License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.
Change model in enhanced-speech-to-text-built-on-whisper microservice
We offer following models for enhanced-speech-to-text-built-on-whisper microservice:
large-v3
- next-gen most accurate multilingual model.large-v2
- most accurate multilingual model. This is the default model.medium
- less accurate but faster thanlarge-v2
.base
- less accurate but faster thanmedium
.small
- less accurate but faster thanbase
.
- Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.
- Upload archive to virtual appliance.
scp licensed-models.zip root@<virtual-appliance-ip>:/data/
- Unzip archive. Models are extracted to directory per microservice:
unzip licensed-models.zip
- Content of the
/data/models
should look like:$ find /data/models
/data/models/
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/small-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.0.0-license.key.txt
/data/models/enhanced_speech_to_text_built_on_whisper/base-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.0.0-license.key.txt
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model
- Change content of the
file
key from"large_v2-1.0.0.model"
to file you've just uploaded ("small-1.0.0.model"
). - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
model:
<Not significant lines omitted>
file: "small-1.0.0.model" - Change the license because you have changed the model. Check (Step 4 of Installation Guide). to see how to do it.
- Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk
To keep up with the latest version of application, load models from virtual appliance volume is possible. For using the image without the model and load existing models from data volume, instance in config file need to be setup as follows:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: en
imageTag: 3.62.0
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
. . .
As a default we count with that the model will be located on path
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. This folder
structure is ensured by unzipping provided licensed-models.zip
archive in
/models/
path. Additionally if the path to the model is different, or the
version of model is not matching with the image, it can be specified in
instances config as a:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: cs
imageTag: 3.62.0
model:
hostPath: /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
model:
hostPath: /data/models/time_analysis/generic-3.62.0.model
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
model:
hostPath: /data/models/audio_quality_estimation/generic-3.62.0.model
. . .
So far model loading from data disk is supported only by the Speech to Text Phonexia and Time Analysis technologies.
Process patented audio codecs with media-conversion
By default media conversion can work only with patent-free audio codecs.
We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.
Pull Media Conversion image directly from Virtual Appliance
This works only if internet (dockerhub) is accessible from the Virtual Appliance.
- [Virtual Appliance] Pull media-conversion image to Virtual Appliance:
k3s ctr image pull docker.io/phonexia/media-conversion:1.0.0
- [Virtual Appliance] Export image to data disk to load it automatically:
k3s ctr image export /data/images/media-conversion-1.0.0.tar docker.io/phonexia/media-conversion:1.0.0
- Reconfigure the Media Conversion to use locally downloaded image as mentioned below.
Push Media Conversion image to Virtual Appliance from workstation
This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.
- [PC] Pull media-conversion image locally to your workstation:
docker pull phonexia/media-conversion:1.0.0
- [PC] Save Media Conversion image to tar archive:
docker save --output media-conversion-1.0.0.tar phonexia/media-conversion:1.0.0
- [PC] Copy
media-conversion-1.0.0.tar
file into virtual appliance via ssh or filebrowser to/data/images
.scp media-conversion-1.0.0.tar root@<IP of virtual appliance>:/data/images/
- [Virtual appliance] Restart virtual appliance to load the image or load it
manually with:
k3s ctr image import /data/images/media-conversion-1.0.0.tar
- Reconfigure the Media Conversion to use locally downloaded image as mentioned below.
Configure Media Conversion to use pre-downloaded image
Last step is to configure Media Conversion to use image downloaded in previous step.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.image
- Change content of the
repository
,registry
,tag
andtagSuffix
tomedia-conversion:
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).
Disable DNS resolving for specific domains
The Kubernetes resolver tries to resolve non-FQDN names with all domains from
/etc/resolv.conf
. This might cause issues if access to the upstream DNS server
(taken from /etc/resolv.conf
as well) is denied. To avoid this issue,
configure the Kubernetes resolver to skip lookup for specific domain(s).
- [Virtual appliance] Create file
/data/speech-platform/coredns-custom.yaml
manually with following content. Replace<domain1.com>
and<domain2.com>
for the domain you want to disable lookup for:apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
<domain1.com>:53 {
log
}
<domain2.com>:53 {
log
} - [Virtual appliance] The file should look like this:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
localdomain:53 {
log
}
example.com:53 {
log
} - [Virtual appliance] Restart coreDNS to apply the change:
kubectl -n kube-system rollout restart deploy/coredns
- [Virtual appliance] Check that coreDNS pod is running:
kubectl -n kube-system get pods -l k8s-app=kube-dns
- [Virtual appliance] Restart all speech-platform pods:
kubectl -n speech-platform rollout restart deploy
kubectl -n speech-platform rollout restart sts
Custom configuration with cloud-init
Cloud-init is a widely used tool for configuring cloud instances at boot time. And we support it in Virtual Appliance.
It can be used for customizing the Virtual Appliance - to create a user with specific SSH key, install extra packages and so on.
How to Pass Cloud-Init User Configuration to Virtual Appliance
This guide will walk you through the steps required to pass a cloud-init user configuration to a Virtual Appliance.
-
The first step is to create a user-data file that contains the configuration information you want to pass to the VM. This file is typically written in YAML and may include various configurations, such as creating users, setting up SSH keys, or running commands. Here is an example of a basic
user-data
file:#cloud-config
users:
- name: phonexia
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr... your_public_key_here
packages:
- htopSave this file as
user-data.yaml
. -
Since non-cloud hypervisors like VirtualBox and VMWare does not have a native method to pass cloud-init data, you need to create a "seed" ISO image that contains your
user-data.yaml
file. Cloud-init will read this data during the virtual machine boot process.You can create an ISO image using the
cloud-localds
command:cloud-localds seed.iso user-data.yaml
This command generates an ISO file named
seed.iso
containing youruser-data.yaml
and generated meta-data file. -
Attach the ISO Image to the Virtual Appliance VM
Next, attach the
seed.iso
file to the VM as a CD-ROM/DVD-ROM. You can do this via the VirtualBox GUI or VMWare vSphere or ESXi Host Client: -
Boot the VM
Cloud-init will automatically detect the attached ISO image and apply the configurations specified in your
user-data.yaml
file. -
Verify Cloud-Init Execution
Once the VM has booted, you can verify that cloud-init has applied the configuration correctly. Connect to your VM via SSH or the console and check the following:
-
Check Cloud-Init Status:
cloud-init status
-
Check that htop package is installed:
htop
This should open htop application.
-
Check that you can login as phonexia user with ssh key:
ssh -i <path_to_ssh_private_key> user@<ip of virtual appliance>
-
Check Cloud-Init Logs: Cloud-init logs its activities in
/var/log/cloud-init.log
and/var/log/cloud-init-output.log
. You can inspect these logs to troubleshoot any issues:less /var/log/cloud-init.log
-
-
(Optional) Detach the ISO Image
Usually you no longer need the
seed.iso
file attached to your VM, you can detach it in a similar way as you attached it.
Uninstall NVIDIA Drivers
Virtual Appliance contains NVIDIA drivers needed for GPU processing. In some cases it might be handy to use different version of the drivers or use different kind of drivers (vGPU) instead. As a first step, current drivers must be uninstalled.
Run following command to uninstall the bundled drivers:
dnf module remove nvidia-driver:550
Note that GPU processing won't work until new drivers are installed. Installation of the new drivers is out of the scope of this document.