Skip to main content
Version: 3.7.0

Adjustments

Following section describes various configuration use cases.

VirtualBox configuration

Linux deployment

If you use VirtualBox to run Virtual Appliance on Linux distributions, you can use our installation script to import and configure Virtual Appliance. To obtain this script, contact Phonexia support. They will provide you with this script. To use it, you must have already downloaded a bundle with the files of the Virtual Appliance itself and a bundle with models and licenses. To run this script, the following steps need to be done:

  1. Open terminal and locate the script.
  2. Make the script an executable:
    chmod +x SpeechPlatformInstaller.sh
  3. Run the script using the following command:
    SpeechPlatformInstaller.sh -m /path/to/models_bundle -v /path/to/VA_bundle -n virtual_machine_name
  4. Wait until the script finishes. When it does, it displays a link to the Speech Platform application.

Windows deployment

If you use VirutalBox on Windows you can use our installer, which will import and configure Virtual Appliance for you. To use this application you will need an archive with Virtual Appliance and an archive with licensed models. When you have these archives, run the app and fill out name of the virtual machine, as hypervisor select VirtualBox, select paths to archives and click install. New window will pop up and show you installation process. After the installation is complete you can access speech platform at http://localhost:1080/app/home

Hyper-V configuration

Supported Hyper-V versions

If you use Hyper-V as your hypervisor we provide configuration files for importing Virtual Appliance. There are however few prerequisites. Firstly you need to check which Hyper-V versions your system supports. You can do this by opening PowerShell as admin and running the following command:

Get-VMHostSupportedVersion

Now you can see which versions of Hyper-V are supported by your system. We ship configuration file for version 8.0.

Automatic configuration

If you are using Hyper-V as your hypervisor and use Windows 11, you can use our installer to import and configure Virtual Appliance for you. If you want to use this installer, you need to have Microsoft Virtual Machine Converter installed on your machine, you can download it from here. It is used for converting the virtual hard disks from vmdk format to Microsoft's VHDX format. You will also need to have archives with virtual appliance and with licensed models. After you obtain these, you can launch the installer, enter VM name, select Hyper-V as hypervisor and fill out paths to archives. Then click install and new window will pop up showing you the progress of installation. After it finishes you can access the speech platform on http://localhost:1080/app/home.

Manual configuration

If you want to configure VA manually please follow the steps described below.

Disk conversion

Next you need to convert provided virtual appliance disks from .vmdk to .vhdx format. You can use something like Starwind V2V or Microsoft Virtual Machine Converter. With Microsoft Virtual Machine Converter after installing it open PowerShell as admin and type in the following commands:

  1. Import the PowerShell module
    Import-Module 'C:\Program Files\Microsoft Virtual Machine Converter\MvmcCmdlet.psd1'
  2. Convert the disks
    ConvertTo-MvmcVirtualHardDisk -SourceLiteralPath <path/to/vmdk> -VhdType DynamicHardDisk -VhdFormat vhdx -DestinationLiteralPath <path/to/target/folder>

Once the disks are converted you need to achieve proper folder structure for Hyper-V. Do this by moving those converted Virtual Hard Disks to their folder as shown below.

speech-platform-virtual-appliance
├── Virtual Hard Disks
| ├── speech-platform-disk0001.vhdx
| └── speech-platform-disk0002.vhdx
└── Virtual Machines
├── <MachineID>.vmcx
└── <MachineID>.vmrs

Networking configuration

Next step is configuring the networking for Virtual Appliance. To set this you need to create a Network Address Translation, a Hyper-V virtual switch and set up port forwarding.

  1. Create Virtual Switch
    New-VMSwitch -Name "SpeechPlatformSwitch" -SwitchType Internal
  2. Add Switch Address
    New-NetIPAddress -IPAddress 192.168.100.1 -PrefixLength 24 -InterfaceAlias "vEthernet (SpeechPlatformSwitch)"
  3. Create NAT
    New-NetNAT -Name "SpeechPlatformNAT" -InternalIPInterfaceAddressPrefix 192.168.100.0/24
  4. Set up NAT port forwarding
    Add-NetNatStatickMapping -NatName "SpeechPlatformNAT" -Protocol TCP -ExternalIPAddress 0.0.0.0 -ExternalPort 1080 -InternalIPAddress 192.168.100.2 -InternalPort 80
    Add-NetNatStatickMapping -NatName "SpeechPlatformNAT" -Protocol TCP -ExternalIPAddress 0.0.0.0 -ExternalPort 2222 -InternalIPAddress 192.168.100.2 -InternalPort 22

Importing Virtual Appliance

When you are done with configuring the networking you can use the Hyper-V Manager UI to import Virtual Appliance. The virtual switch will be automatically detected and attached to the VA. As a last step you will need to set static IP address in Virtual appliance. There are two ways to configure this.

  1. The first and simplest way is using a cloud-init configuration. In the directory that contains the configuration files for Hyper-V, there is also the file seed.iso. When you open VM settings in Hyper-V select IDE Controller 0 and add DVD Drive. Select the provided image file and select Apply. Now that you have attached this iso image cloud-init will automatically detect it and set its IP address with the same configuration as you would with the previous configuration.

  2. Second is starting Virtual Appliance and connecting to it using the Hyper-V Virtual Machine Connection. After logging in run following commands:

    nmcli con add type ethernet con-name eth0 ifname eth0 ipv4.addresses 192.168.100.2/24 ipv4.gateway 192.168.100.1 ipv4.dns "8.8.8.8 8.8.4.4" ipv4.method manual
    nmcli con up eth0

    After executing these commands reboot Virtual Appliance.

How to modify OVF to Hyper-V compatible VM

  1. Both of existing virtual HDDs (.vmdk) need to be converted to Hyper-V compatible HDDs (.vhdx). Do it through this program: Starwind V2V Converter.
  2. Create new VM in Hyper-V.
  3. IMPORTANT: Use Generation 1 VM - Generation 2 doesn't work.
  4. Enable networking/make sure it is enabled.
  5. OPTIONAL: Disable options like DVD drive or SCSI controller since they are not needed.
  6. Set Memory to at least 32GB and CPUs to at least 8 cores.
  7. Attach HDDs, preferably onto one IDE controller.
  8. Start the VM.
  9. After it starts, check IP address either printed out on a login screen. Wait for the entire engine to start.
  10. Go to the IP from the previous step and verify that the entire VM works as it should.

Cofiguration of microservices models and licenses

Changing microservice models

In case you use models other than the default, you need to change the values of paths in /data/speech-platform/speech-platform-licenses.yaml file:

  • <microservice>.config.model.file value leading to model, and
  • <microservice>.config.license.key value leading to the license for used model.

Example (change model large_v2-1.0.1 to small-1.0.1 for enhanced-speech-to-text-built-on-whisper microservice):

enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "large_v2-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "large_v2-1.0.1"

needs to be changed to:

enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "small-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "small-1.0.1"

These changes are required for all microservices with licensed models except speech-to-text-phonexia, time-analysis and audio-quality-estimation.

Inspect microservices models

Models are stored inside the /data/models folder, where path to each model is constructed as:

/data/models/<technology_name>/<model_name>-<model_version>.model

Where:

  • technology_name - is name of the technology, e.g. speaker_identification
  • model_name - is name of the model, e.g. xl
  • model_version - is version of a model, e.g. 5.0.0

Imported models can be inspected after the uploading (Step 4 of Installation Guide) by following command:

  1. Content of the /data/models:
    $ find /data/models
    /data/models/
    /data/models/speaker_identification
    /data/models/speaker_identification/xl-5.0.0.model
    /data/models/speaker_identification/xl-5.0.0-license.txt
    /data/models/enhanced_speech_to_text_built_on_whisper
    /data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1.model
    /data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt
    /data/models/speech_to_text_phonexia
    /data/models/speech_to_text_phonexia/en_us_6-3.62.0-license.txt
    /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
    /data/models/time_analysis
    /data/models/time_analysis/generic-3.62.0-license.txt
    /data/models/time_analysis/generic-3.62.0.model

Inspect microservices licenses

Licenses are stored in path /data/speech-platform/speech-platform-licenses.yaml. File contains the Kubernetes secrets definition of a licenses which ensures the simple loading of licenses to the application.

Imported licenses can be inspected after the uploading (Step 4 of Installation Guide) by following command:

  1. Content of the /data/speech-platform folder:
    $ find /data/speech-platform/
    /data/speech-platform/
    /data/speech-platform/speech-platform-licenses.yaml
    /data/speech-platform/speech-platform-values.yaml

Kubernetes secret definitions in a file are separated by ---. Each secret contains the contents of the file on the .stringData.license path corresponding to the technology for which the license is meant. For example:

  • For model of technology speaker_identification with name xl and version 5.0.0, the secret will look like this:
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque

Content of a license file (/data/speech-platform/speech-platform-licenses.yaml) can be shown by following command:

  1. Content of the license file:
    $ cat /data/speech-platform/speech-platform-licenses.yaml
    ---
    apiVersion: v1
    kind: Secret
    metadata:
    name: speaker-identification-license
    namespace: speech-platform
    stringData:
    license: |
    <content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
    type: Opaque
    ---
    apiVersion: v1
    kind: Secret
    metadata:
    name: enhanced-speech-to-text-built-on-whisper-license
    namespace: speech-platform
    stringData:
    license: |
    <content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt" file>
    type: Opaque
    .
    .
    .

Set DNS name for speech platform virtual appliance

Speech platform is accessible on http://<IP_address_of_virtual_appliance>. We recommend to create DNS record to make access more comfortable for users. Consult your DNS provider to get more information how to add corresponding DNS record.

Use HTTPS certificate

Speech platform is also accessible via HTTPS protocol on https://<IP_address_of_virtual_appliance>. If you prefer secure communication you might need to use your own TLS certificate for securing the communication.

To do so, follow this guide:

  1. Prepare the TLS certificate beforehand.
  2. Put certificate private key in file named cert.key.
  3. Put certificate into file named cert.crt.
  4. Create kubernetes secret manifest storing the certificate and private key:
    kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml
  5. Copy manifest (resulting file) to /data/ingress-nginx/certificate-server.yaml.
  6. Open text file /data/ingress-nginx/ingress-nginx-values.yaml either directly from inside virtual appliance or via File Browser.
  7. Locate key .spec.valuesContent.controller.extraArgs.default-ssl-certificate
  8. Uncomment the line.
  9. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: ingress-nginx
    namespace: kube-system
    spec:
    valuesContent: |-
    controller:
    <Not significant lines omitted>
    extraArgs:
    <Not significant lines omitted>
    default-ssl-certificate: "ingress-nginx/default-ssl-certificate"
  10. Save the file
  11. Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Extend disks

Disks are extended automatically on VM startup by growfs systemd service when you extend the backing volume/disk in the hypervisor. You can trigger the extension manually by running the script /root/grow-partition-and-filesystems.sh. It grows partition and filesystem for both system and data disks.

Phonexia speech to text microservice

This section describes configuration specific to phonexia speech to text microservice.

Permanent vs onDemand instances

Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.

All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
  2. Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.
  3. Corresponding section looks like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    .
    .
    .
    - name: cs
    imageTag: 3.62.0-stt-cs_cz_6
    onDemand:
    enabled: true
    .
    .
    .
  4. Delete onDemand key and its subkeys.
  5. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    .
    .
    .
    - name: cs
    imageTag: 3.62.0-stt-cs_cz_6
    .
    .
    .
  6. Save the file.
  7. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  8. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Configure languages in speech-to-text-phonexia microservice

This microservice consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.

By default all languages/instances are enabled in on-demand mode. List of languages:

  • ar_kw_6
  • ar_xl_6
  • bn_6
  • cs_cz_6
  • de_de_6
  • en_us_6
  • es_6
  • fa_6
  • fr_fr_6
  • hr_hr_6
  • hu_hu_6
  • it_it_6
  • ka_ge_6
  • kk_kz_6
  • nl_6
  • pl_pl_6
  • ps_6
  • ru_ru_6
  • sk_sk_6
  • sr_rs_6
  • sv_se_6
  • tr_tr_6
  • uk_ua_6
  • vi_vn_6
  • zh_cn_6

How to disable all language instances except of cs_cz_6 and en_us_6:

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  2. Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.
  3. Corresponding section looks like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    - name: ark
    imageTag: 3.62.0-stt-ar_kw_6
    onDemand:
    enabled: true
    - name: arx
    imageTag: 3.62.0-stt-ar_xl_6
    onDemand:
    enabled: true
    - name: bn
    imageTag: 3.62.0-stt-bn_6
    onDemand:
    enabled: true
    - name: cs
    imageTag: 3.62.0-stt-cs_cz_6
    onDemand:
    enabled: true
    - name: de
    imageTag: 3.62.0-stt-de_de_6
    onDemand:
    enabled: true
    - name: en
    imageTag: 3.62.0-stt-en_us_6
    onDemand:
    enabled: true
    .
    .
    .
    - name: vi
    imageTag: 3.62.0-stt-vi_vn_6
    onDemand:
    enabled: true
    - name: zh
    imageTag: 3.62.0-stt-zh_cn_6
    onDemand:
    enabled: true
  4. Comment out all the instances except (cs_cz_6 and en_us_6).
  5. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    #- name: ark
    # imageTag: 3.62.0-stt-ar_kw_6
    # onDemand:
    # enabled: true
    #- name: arx
    # imageTag: 3.62.0-stt-ar_xl_6
    # onDemand:
    # enabled: true
    #- name: bn
    # imageTag: 3.62.0-stt-bn_6
    # onDemand:
    # enabled: true
    - name: cs
    imageTag: 3.62.0-stt-cs_cz_6
    onDemand:
    enabled: true
    #- name: de
    # imageTag: 3.62.0-stt-de_de_6
    # onDemand:
    # enabled: true
    - name: en
    imageTag: 3.62.0-stt-en_us_6
    onDemand:
    enabled: true
    .
    .
    .
    #- name: vi
    # imageTag: 3.62.0-stt-vi_vn_6
    # onDemand:
    # enabled: true
    #- name: zh
    # imageTag: 3.62.0-stt-zh_cn_6
    # onDemand:
    # enabled: true
  6. Or you can even delete the instances you are not interested in.
  7. Then updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    - name: cs
    imageTag: 3.62.0-stt-cs_cz_6
    onDemand:
    enabled: true
    - name: en
    imageTag: 3.62.0-stt-en_us_6
    onDemand:
    enabled: true
  8. Save the file.
  9. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  10. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Modify replicas for permanent language instances

Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.

We do not recommend increasing replicas for any microservice when

virtual appliance is running with default resources (4CPU, 32GB memory)! Note: OnDemand instance has always only one replica. :::

  1. Find out which language instance you want to configure replicas for.
  2. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  3. Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount.
  4. Change the value to desired amount of replicas.
  5. Updated file should look like:
  6. Corresponding section looks like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    - name: cs
    imageTag: 3.62.0-stt-cs_cz_6
    replicaCount: 2
  7. Save the file.
  8. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  9. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Modify parallelism for instances

Each instance is able to process only one request at the time, unless the parallelism is overridden. Value of parallelism means the maximum number of requests processed by one instance. Parallelism is set globally for all instances of technology, however each instance can override the value. To override parallelism for speech-to-text-phonexia, time-analysis, or audio-quality-estimation these steps needs to be followed:

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  2. Find the key, depending on technology (speech-to-text-phonexia, time-analysis, audio-quality-estimation) for which parallelism should be overridden: .spec.valuesContent.<technology>.parallelism
  3. Change the value to desired number of requests processed in parallel
  4. Corresponding section looks like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
    # Global value of parallelism for all instances
    parallelism: 2
    config:
    <Not significant lines omitted>
    instances:
    - name: cs
    imageTag: 3.62.0
    - name: en
    imageTag: 3.62.0
    # Override of parallelism for en instance
    parallelism: 4
  5. Save the file.
  6. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  7. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Modify microservice replicas

Each microservice has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding microservices.

We do not recommend increasing replicas for any microservice when

virtual appliance is running with default resources (4CPU, 32GB memory)! :::

  1. Find out which microservices you want to modify replicas - audio-quality-estimation, deepfake-detection, emotion-recognition, enhanced-speech-to-text-built-on-whisper, language-identification, speaker-diarization, voice-activity-detection, voiceprint-comparison or voiceprint-extraction.
  2. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  3. Locate key .spec.valuesContent.<microservice>.replicaCount
  4. Change the value to desired amount of replicas.
  5. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
    replicaCount: 2
  6. Save the file.
  7. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  8. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Run microservice on GPU

Some of the microservices can run on GPU which increase the processing speed. Microservices that can run on GPU are deepfake-detection, emotion-recognition, enhanced-speech-to-text-built-on-whisper, gender-identification, language-identification, speaker-diarization, voice-activity-detection, and voiceprint-extraction.

At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi to list all the devices. If device is present and visible to the system, then output should look like:

[root@speech-platform ~]# nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, then you can reconfigure the microservice to use GPU for the processing.

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.

  2. Locate microservice section .spec.valuesContent.<microservice>.

  3. Locate key .spec.valuesContent.<microservice>.config.device.

  4. Uncomment the line so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    # Uncomment this to force microservice to run on GPU
    device: cuda
  5. Locate key .spec.valuesContent.<microservice>.resources.

  6. Request GPU resources for the processing so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
    # Uncomment this to grant access to GPU on whisper pod
    resources:
    limits:
    nvidia.com/gpu: "1"
  7. Locate key .spec.valuesContent.<microservice>.runtimeClassName.

  8. Set runtimeClassName so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
    # Uncomment this to run whisper on GPU
    runtimeClassName: "nvidia"
  9. Locate key .spec.valuesContent.<microservice>.updateStrategy.

  10. Set type to Recreate to allow seamless updates so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <microservice>:
    <Not significant lines omitted>
    # Uncomment this to allow seamless updates on single GPU machine
    updateStrategy:
    type: Recreate
  11. Example: Updated file for enhanced-speech-to-text-built-on-whisper should look like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    device: cuda

    <Not significant lines omitted>
    resources:
    limits:
    nvidia.com/gpu: "1"

    <Not significant lines omitted>
    runtimeClassName: "nvidia"

    <Not significant lines omitted>
    updateStrategy:
    type: Recreate
  12. Save the file.

  13. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.

  14. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

GPU parallelism settings

This section describes how to control processing parallelism when microservice is running on GPU. Following configuration applies only to microservices enhanced-speech-to-text-built-on-whisper, language-identification, voice-activity-detection, voiceprint-extraction, deepfake-detection and gender-identification:

    <microservice>
config:
# -- Parallel tasks per device. GPU only.
instancesPerDevice: 1
# -- Index of device to use. GPU only.
#deviceIndex: 0

There are two configuration options:

  • instancesPerDevice - Controls how many tasks can be processed by a microservice on single GPU in parallel. Higher value means higher GPU utilization (both processor- and memory-wise).
  • deviceIndex - Controls which GPU card to use in case there are multiple GPU cards. We usually discourage to use this in most cases.

Change model used in a microservice

Each microservice needs a model to do its job properly. We provide more models for some microservices, for example enhanced-speech-to-text-built-on-whisper. Usually we pre-configure microservices with the most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.

License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.

Change model in enhanced-speech-to-text-built-on-whisper microservice

We offer following models for enhanced-speech-to-text-built-on-whisper microservice:

  • large-v3 - next-gen most accurate multilingual model.
  • large-v2 - most accurate multilingual model. This is the default model.
  • medium - less accurate but faster than large-v2.
  • base - less accurate but faster than medium.
  • small - less accurate but faster than base.
  1. Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.
  2. Upload archive to virtual appliance.
    scp licensed-models.zip root@<virtual-appliance-ip>:/data/
  3. Unzip archive. Models are extracted to directory per microservice:
    unzip licensed-models.zip
  4. Content of the /data/models should look like:
    $ find /data/models
    /data/models/
    /data/models/enhanced_speech_to_text_built_on_whisper
    /data/models/enhanced_speech_to_text_built_on_whisper/small-1.0.0.model
    /data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.0.0-license.key.txt
    /data/models/enhanced_speech_to_text_built_on_whisper/base-1.0.0.model
    /data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.0.0-license.key.txt
    /data/models/speaker_identification
    /data/models/speaker_identification/xl-5.0.0.model
    /data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt
  5. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
  6. Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model
  7. Change content of the file key from "large_v2-1.0.0.model" to file you've just uploaded ("small-1.0.0.model").
  8. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
    config:
    model:
    <Not significant lines omitted>
    file: "small-1.0.0.model"
  9. Change the license because you have changed the model. Check (Step 4 of Installation Guide). to see how to do it.
  10. Save the file.
  11. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  12. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk

To keep up with the latest version of application, load models from virtual appliance volume is possible. For using the image without the model and load existing models from data volume, instance in config file need to be setup as follows:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: en
imageTag: 3.62.0
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
. . .

As a default we count with that the model will be located on path /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model. This folder structure is ensured by unzipping provided licensed-models.zip archive in /models/ path. Additionally if the path to the model is different, or the version of model is not matching with the image, it can be specified in instances config as a:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: cs
imageTag: 3.62.0
model:
hostPath: /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
model:
hostPath: /data/models/time_analysis/generic-3.62.0.model
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
model:
hostPath: /data/models/audio_quality_estimation/generic-3.62.0.model
. . .

So far model loading from data disk is supported only by the Speech to Text Phonexia and Time Analysis technologies.

Process patented audio codecs with media-conversion

By default media conversion can work only with patent-free audio codecs.

We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.

Pull Media Conversion image directly from Virtual Appliance

This works only if internet (dockerhub) is accessible from the Virtual Appliance.

  1. [Virtual Appliance] Pull media-conversion image to Virtual Appliance:
    k3s ctr image pull docker.io/phonexia/media-conversion:1.0.0
  2. [Virtual Appliance] Export image to data disk to load it automatically:
    k3s ctr image export /data/images/media-conversion-1.0.0.tar docker.io/phonexia/media-conversion:1.0.0
  3. Reconfigure the Media Conversion to use locally downloaded image as mentioned below.

Push Media Conversion image to Virtual Appliance from workstation

This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.

  1. [PC] Pull media-conversion image locally to your workstation:
    docker pull phonexia/media-conversion:1.0.0
  2. [PC] Save Media Conversion image to tar archive:
    docker save --output media-conversion-1.0.0.tar phonexia/media-conversion:1.0.0
  3. [PC] Copy media-conversion-1.0.0.tar file into virtual appliance via ssh or filebrowser to /data/images.
    scp media-conversion-1.0.0.tar root@<IP of virtual appliance>:/data/images/
  4. [Virtual appliance] Restart virtual appliance to load the image or load it manually with:
    k3s ctr image import /data/images/media-conversion-1.0.0.tar
  5. Reconfigure the Media Conversion to use locally downloaded image as mentioned below.

Configure Media Conversion to use pre-downloaded image

Last step is to configure Media Conversion to use image downloaded in previous step.

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
  2. Locate key .spec.valuesContent.media-conversion.image
  3. Change content of the repository, registry, tag and tagSuffix to
    media-conversion:
    image:
    registry: docker.io
    repository: phonexia/media-conversion
    tag: 1.0.0
    tagSuffix: ""
  4. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    media-conversion:
    <Not significant lines omitted>
    image:
    registry: docker.io
    repository: phonexia/media-conversion
    tag: 1.0.0
    tagSuffix: ""
  5. Save the file.
  6. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  7. Check that the configuration is valid and successfully applied (Step 5 of Installation Guide).

Disable DNS resolving for specific domains

The Kubernetes resolver tries to resolve non-FQDN names with all domains from /etc/resolv.conf. This might cause issues if access to the upstream DNS server (taken from /etc/resolv.conf as well) is denied. To avoid this issue, configure the Kubernetes resolver to skip lookup for specific domain(s).

  1. [Virtual appliance] Create file /data/speech-platform/coredns-custom.yaml manually with following content. Replace <domain1.com> and <domain2.com> for the domain you want to disable lookup for:
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: coredns-custom
    namespace: kube-system
    data:
    custom.server: |
    <domain1.com>:53 {
    log
    }
    <domain2.com>:53 {
    log
    }
  2. [Virtual appliance] The file should look like this:
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: coredns-custom
    namespace: kube-system
    data:
    custom.server: |
    localdomain:53 {
    log
    }
    example.com:53 {
    log
    }
  3. [Virtual appliance] Restart coreDNS to apply the change:
    kubectl -n kube-system rollout restart deploy/coredns
  4. [Virtual appliance] Check that coreDNS pod is running:
    kubectl -n kube-system get pods -l k8s-app=kube-dns
  5. [Virtual appliance] Restart all speech-platform pods:
    kubectl -n speech-platform rollout restart deploy
    kubectl -n speech-platform rollout restart sts

Custom configuration with cloud-init

Cloud-init is a widely used tool for configuring cloud instances at boot time. And we support it in Virtual Appliance.

It can be used for customizing the Virtual Appliance - to create a user with specific SSH key, install extra packages and so on.

How to Pass Cloud-Init User Configuration to Virtual Appliance

This guide will walk you through the steps required to pass a cloud-init user configuration to a Virtual Appliance.

  1. The first step is to create a user-data file that contains the configuration information you want to pass to the VM. This file is typically written in YAML and may include various configurations, such as creating users, setting up SSH keys, or running commands. Here is an example of a basic user-data file:

    #cloud-config
    users:
    - name: phonexia
    ssh_authorized_keys:
    - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr... your_public_key_here

    packages:
    - htop

    Save this file as user-data.yaml.

  2. Since non-cloud hypervisors like VirtualBox and VMWare does not have a native method to pass cloud-init data, you need to create a "seed" ISO image that contains your user-data.yaml file. Cloud-init will read this data during the virtual machine boot process.

    You can create an ISO image using the cloud-localds command:

    cloud-localds seed.iso user-data.yaml

    This command generates an ISO file named seed.iso containing your user-data.yaml and generated meta-data file.

  3. Attach the ISO Image to the Virtual Appliance VM

    Next, attach the seed.iso file to the VM as a CD-ROM/DVD-ROM. You can do this via the VirtualBox GUI or VMWare vSphere or ESXi Host Client:

  4. Boot the VM

    Cloud-init will automatically detect the attached ISO image and apply the configurations specified in your user-data.yaml file.

  5. Verify Cloud-Init Execution

    Once the VM has booted, you can verify that cloud-init has applied the configuration correctly. Connect to your VM via SSH or the console and check the following:

    1. Check Cloud-Init Status:

      cloud-init status
    2. Check that htop package is installed:

      htop

      This should open htop application.

    3. Check that you can login as phonexia user with ssh key:

      ssh -i <path_to_ssh_private_key> user@<ip of virtual appliance>
    4. Check Cloud-Init Logs: Cloud-init logs its activities in /var/log/cloud-init.log and /var/log/cloud-init-output.log. You can inspect these logs to troubleshoot any issues:

      less /var/log/cloud-init.log
  6. (Optional) Detach the ISO Image

    Usually you no longer need the seed.iso file attached to your VM, you can detach it in a similar way as you attached it.

Uninstall NVIDIA Drivers

Virtual Appliance contains NVIDIA drivers needed for GPU processing. In some cases it might be handy to use different version of the drivers or use different kind of drivers (vGPU) instead. As a first step, current drivers must be uninstalled.

Run following command to uninstall the bundled drivers:

dnf module remove nvidia-driver:550

Note that GPU processing won't work until new drivers are installed. Installation of the new drivers is out of the scope of this document.