Skip to main content
Version: 2026.03.0

Adjustments

This section describes various configuration use cases for the Speech Platform virtual appliance.

Configuration file

Unless stated otherwise, all configuration changes in this section are made in the file /data/speech-platform/speech-platform-values.yaml. You can edit it directly from inside the virtual appliance or via File Browser.

After saving changes, the application automatically recognizes the update and redeploys with the new configuration. Always validate the YAML after making changes.

Technology overview

The Speech Platform consists of multiple technology components. Each technology runs as an independent service and can be individually enabled, configured, and scaled.

Technologies are divided into two categories:

Microservices

Each microservice runs a single model and is configured directly at the technology level:

  • age-estimation
  • audio-manipulation-detection
  • audio-quality-estimation
  • deepfake-detection
  • denoiser
  • emotion-recognition
  • enhanced-speech-to-text-built-on-whisper
  • gender-identification
  • language-identification
  • referential-deepfake-detection
  • replay-attack-detection
  • speaker-diarization
  • time-analysis
  • voice-activity-detection
  • voiceprint-comparison
  • voiceprint-extraction

Instance-based microservices

Instance-based microservices can run multiple instances, typically one per language or model variant. Each instance is configured separately under config.instances:

  • keyword-spotting
  • speech-to-text
  • text-translation

Enable or disable technologies

By default, all technologies are disabled (enabled: false). To enable a technology, set the enabled flag to true in the configuration file.

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  2. Locate the technology section .spec.valuesContent.<technology>.
  3. Set enabled to true:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <technology>:
    enabled: true
  4. Save the file.
  5. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  6. Check that the configuration is valid and successfully applied (Configuration file checks).

Configuration of technology models and licenses

Changing technology models

In case you use models other than the default, you need to change the values in the /data/speech-platform/speech-platform-values.yaml file:

  • <technology>.config.model.file value leading to model, and
  • <technology>.config.license.key value leading to the license for used model.

Example (change model large_v2-1.3.0 to small-1.3.0 for enhanced-speech-to-text-built-on-whisper technology):

enhanced-speech-to-text-built-on-whisper:
config:
model:
file: "large_v2-1.3.0.model"
license:
key: "large_v2-1"

needs to be changed to:

enhanced-speech-to-text-built-on-whisper:
config:
model:
file: "small-1.3.0.model"
license:
key: "small-1"

These changes are required for all technologies with licensed models.

Inspect technologies models

Models are stored inside the /data/models folder, where path to each model is constructed as:

/data/models/<technology_name>/<model_name>-<model_version>.model

Where:

  • technology_name - is name of the technology, e.g. speaker_identification
  • model_name - is name of the model, e.g. xl
  • model_version - is version of a model, e.g. 5.4.0

Imported models can be inspected after the uploading by following command:

  1. Content of the /data/models:

    Terminal
    find /data/models

    Example output:

    /data/models/
    /data/models/enhanced_speech_to_text_built_on_whisper
    /data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.3.0.model
    /data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.3.0-license.txt
    /data/models/speaker_identification
    /data/models/speaker_identification/xl-5.4.0.model
    /data/models/speaker_identification/xl-5.4.0-license.txt
    /data/models/speech_to_text
    /data/models/speech_to_text/en_us-6.2.0.model
    /data/models/speech_to_text/en_us-6.2.0-license.txt
    /data/models/time_analysis
    /data/models/time_analysis/generic-1.2.0.model
    /data/models/time_analysis/generic-1.2.0-license.txt

Inspect technologies licenses

Licenses are stored in path /data/speech-platform/speech-platform-licenses.yaml. File contains the Kubernetes secrets definition of a licenses which ensures the simple loading of licenses to the application.

Imported licenses can be inspected after the uploading by following command:

  1. Content of the /data/speech-platform folder:

    Terminal
    find /data/speech-platform/

    Example output:

    /data/speech-platform/
    /data/speech-platform/speech-platform-licenses.yaml
    /data/speech-platform/speech-platform-values.yaml

Kubernetes secret definitions in a file are separated by ---. Each secret contains the contents of the file on the .stringData.license path corresponding to the technology for which the license is meant. For example:

  • For model of technology speaker_identification with name xl and version 5.4.0, the secret will look like this:
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.4.0-license.txt" file>
type: Opaque

Content of a license file (/data/speech-platform/speech-platform-licenses.yaml) can be shown by following command:

  1. Content of the license file:

    Terminal
    cat /data/speech-platform/speech-platform-licenses.yaml

    Example output:

    ---
    apiVersion: v1
    kind: Secret
    metadata:
    name: speaker-identification-license
    namespace: speech-platform
    stringData:
    license: |
    <content of "/data/models/speaker_identification/xl-5.4.0-license.txt" file>
    type: Opaque
    ---
    apiVersion: v1
    kind: Secret
    metadata:
    name: enhanced-speech-to-text-built-on-whisper-license
    namespace: speech-platform
    stringData:
    license: |
    <content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.3.0-license.txt" file>
    type: Opaque
    .
    .
    .

Set DNS name for speech platform virtual appliance

Speech platform is accessible on http://<IP_address_of_virtual_appliance>. We recommend to create DNS record to make access more comfortable for users. Consult your DNS provider to get more information how to add corresponding DNS record.

Use HTTPS certificate

Speech platform is also accessible via HTTPS protocol on https://<IP_address_of_virtual_appliance>. If you prefer secure communication you might need to use your own TLS certificate for securing the communication.

To do so, follow this guide:

  1. Prepare the TLS certificate beforehand.

  2. Put certificate private key in file named cert.key.

  3. Put certificate into file named cert.crt.

  4. Create kubernetes secret manifest storing the certificate and private key:

    Terminal
    kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml
  5. Copy manifest (resulting file) to /data/ingress-nginx/certificate-server.yaml.

  6. Open text file /data/ingress-nginx/ingress-nginx-values.yaml either directly from inside virtual appliance or via File Browser.

  7. Locate key .spec.valuesContent.controller.extraArgs.default-ssl-certificate

  8. Uncomment the line.

  9. Updated file should look like:

    /data/ingress-nginx/ingress-nginx-values.yaml
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: ingress-nginx
    namespace: kube-system
    spec:
    valuesContent: |-
    controller:
    <Not significant lines omitted>
    extraArgs:
    <Not significant lines omitted>
    default-ssl-certificate: "ingress-nginx/default-ssl-certificate"
  10. Save the file

  11. Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Extend disks

Disks are extended automatically on VM startup by growfs systemd service when you extend the backing volume/disk in the hypervisor.

Instance-based microservices configuration

This section describes configuration specific to instance-based microservices (speech-to-text, keyword-spotting, and text-translation). Each instance typically corresponds to a single language or model variant.

info

The examples in this section use speech-to-text technology, but the same principles apply to keyword-spotting and text-translation.

Permanent vs on-demand instances

A permanent instance is started and running (consuming resources) all the time. An on-demand instance is started only when a corresponding task is queued and stopped when all tasks have been processed.

FeaturePermanentOn-demand
Resource usageConstantOnly when processing
Startup latencyNoneCold start delay
Best forHigh-throughput workloadsOccasional usage

All instances are on-demand by default. To reconfigure an instance from on-demand to permanent:

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  2. Locate key .spec.valuesContent.speech-to-text.config.instances.
  3. Corresponding section looks like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    .
    .
    .
    - name: cs
    imageTag: 6.6.0-cs_cz
    onDemand:
    enabled: true
    cooldownPeriod: 600
    .
    .
    .
  4. Delete the onDemand key and its subkeys for the desired instance.
  5. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    .
    .
    .
    - name: cs
    imageTag: 6.6.0-cs_cz
    .
    .
    .
  6. Save the file.
  7. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  8. Check that the configuration is valid and successfully applied (Configuration file checks).

Configure languages in Speech to Text technology

This technology consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.

By default all languages/instances are enabled in on-demand mode. List of languages:

  • ar_kw
  • ar_xl
  • bn
  • cs_cz
  • de_de
  • en_us
  • es
  • fa
  • fr_fr
  • hr_hr
  • hu_hu
  • it_it
  • ka_ge
  • kk_kz
  • nl
  • pl_pl
  • ps
  • ru_ru
  • sk_sk
  • sr_rs
  • sv_se
  • tr_tr
  • uk_ua
  • vi_vn
  • zh_cn

How to disable all language instances except cs_cz and en_us:

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.

  2. Locate key .spec.valuesContent.speech-to-text.config.instances.

  3. Corresponding section looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    - name: ar-kw
    imageTag: <version>-ar_kw
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: ar-xl
    imageTag: <version>-ar_xl
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: bn
    imageTag: 6.1.0-bn
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: cs
    imageTag: 6.6.0-cs_cz
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: de
    imageTag: 6.1.0-de_de
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: en
    imageTag: 6.2.0-en_us
    onDemand:
    enabled: true
    cooldownPeriod: 600
    .
    .
    .
    - name: vi
    imageTag: 6.2.0-vi_vn
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: zh
    imageTag: 6.1.0-zh_cn
    onDemand:
    enabled: true
    cooldownPeriod: 600
  4. Comment out or delete all instances except the ones you need.

    Option A — Comment out unwanted instances:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    #- name: ar-kw
    # imageTag: <version>-ar_kw
    # onDemand:
    # enabled: true
    # cooldownPeriod: 600
    #- name: bn
    # imageTag: 6.1.0-bn
    # onDemand:
    # enabled: true
    # cooldownPeriod: 600
    - name: cs
    imageTag: 6.6.0-cs_cz
    onDemand:
    enabled: true
    cooldownPeriod: 600
    #- name: de
    # imageTag: 6.1.0-de_de
    # onDemand:
    # enabled: true
    # cooldownPeriod: 600
    - name: en
    imageTag: 6.2.0-en_us
    onDemand:
    enabled: true
    cooldownPeriod: 600
    .
    .
    .

    Option B — Delete unwanted instances entirely (cleaner):

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    - name: cs
    imageTag: 6.6.0-cs_cz
    onDemand:
    enabled: true
    cooldownPeriod: 600
    - name: en
    imageTag: 6.2.0-en_us
    onDemand:
    enabled: true
    cooldownPeriod: 600
  5. Save the file.

  6. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.

  7. Check that the configuration is valid and successfully applied (Configuration file checks).

Modify replicas for permanent language instances

Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.

caution

We do not recommend increasing replicas for any technology when virtual appliance is running with default resources (4 CPU, 32 GB memory)! On-demand instances always have only one replica.

  1. Find out which language instance you want to configure replicas for.
  2. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  3. Locate key .spec.valuesContent.speech-to-text.config.instances.<language instance>.replicaCount.
  4. Change the value to desired amount of replicas.
  5. Corresponding section looks like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    speech-to-text:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    instances:
    - name: cs
    imageTag: 6.6.0-cs_cz
    replicaCount: 2
  6. Save the file.
  7. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  8. Check that the configuration is valid and successfully applied (Configuration file checks).

Modify technology replicas

Each technology has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding technologies.

caution

We do not recommend increasing replicas for any technology when virtual appliance is running with default resources (4CPU, 32GB memory)!

  1. Find out which technologies you want to modify replicas — age-estimation, audio-manipulation-detection, audio-quality-estimation, deepfake-detection, denoiser, emotion-recognition, enhanced-speech-to-text-built-on-whisper, gender-identification, keyword-spotting, language-identification, referential-deepfake-detection, replay-attack-detection, speaker-diarization, speech-to-text, text-translation, time-analysis, voice-activity-detection, voiceprint-comparison, or voiceprint-extraction.
  2. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
  3. Locate key .spec.valuesContent.<technology>.replicaCount
  4. Change the value to desired amount of replicas.
  5. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
    replicaCount: 2
  6. Save the file.
  7. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  8. Check that the configuration is valid and successfully applied (Configuration file checks).

Run technology on GPU

Some of the technologies can run on GPU which increases the processing speed. Technologies that support GPU acceleration are: age-estimation, audio-manipulation-detection, audio-quality-estimation, deepfake-detection, emotion-recognition, enhanced-speech-to-text-built-on-whisper, gender-identification, keyword-spotting, language-identification, referential-deepfake-detection, replay-attack-detection, speaker-diarization, speech-to-text, text-translation, voice-activity-detection, and voiceprint-extraction.

note

Technologies denoiser, time-analysis, and voiceprint-comparison do not support GPU acceleration.

First, make sure the virtual appliance can see the GPU device(s). Use nvidia-smi to list all the devices. If a device is present and visible to the system, the output should look like:

Terminal
nvidia-smi -L

Example output:

GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, you can reconfigure the technology to use GPU for processing.

tip

CPU resources are still used alongside GPU. We recommend allocating at least 1 CPU core per concurrent processing task.

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.

  2. Locate technology section .spec.valuesContent.<technology>.

  3. Locate key .spec.valuesContent.<technology>.config.device.

  4. Change the value from cpu to cuda:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    device: cuda
  5. Locate key .spec.valuesContent.<technology>.resources.

  6. Request GPU resources for the processing so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
    resources:
    limits:
    nvidia.com/gpu: "1"
  7. Locate key .spec.valuesContent.<technology>.runtimeClassName.

  8. Set runtimeClassName so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
    runtimeClassName: "nvidia"
  9. Locate key .spec.valuesContent.<technology>.updateStrategy.

  10. Set type to Recreate to allow seamless updates so that it looks like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
    updateStrategy:
    type: Recreate
  11. Example: Updated file for enhanced-speech-to-text-built-on-whisper should look like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
    config:
    <Not significant lines omitted>
    device: cuda

    <Not significant lines omitted>
    resources:
    limits:
    nvidia.com/gpu: "1"

    <Not significant lines omitted>
    runtimeClassName: "nvidia"

    <Not significant lines omitted>
    updateStrategy:
    type: Recreate
  12. Save the file.

  13. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.

  14. Check that the configuration is valid and successfully applied (Configuration file checks).

Processing parallelism settings

This section describes how to control processing parallelism for technologies. These settings apply to both CPU and GPU modes.

instancesPerDevice

Controls how many tasks can be processed concurrently by a single technology replica. A higher value means higher utilization (both processor- and memory-wise). This setting is available for most technologies.

<technology>:
config:
instancesPerDevice: 1

threadsPerInstance

Controls the number of CPU threads used per instance for processing. This applies to CPU processing only. Set to 0 to automatically detect the optimal number of threads.

<technology>:
config:
threadsPerInstance: 1
note

Not all technologies support threadsPerInstance. Technologies denoiser, time-analysis, and voiceprint-comparison do not have this setting. Additionally, voiceprint-comparison does not support instancesPerDevice.

deviceIndex / deviceIndices

Controls which GPU card(s) to use when multiple GPU cards are present. We generally discourage changing this unless you have a specific multi-GPU setup.

<technology>:
config:
deviceIndex: 0

Change model used in a technology

Each technology needs a model to do its job properly. We provide more models for some technologies, for example enhanced-speech-to-text-built-on-whisper. Usually we pre-configure technologies with the most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.

License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.

Change model in Enhanced Speech to Text Built on Whisper technology

We offer following models for enhanced-speech-to-text-built-on-whisper technology:

  • large-v3 - next-gen most accurate multilingual model.
  • large-v2 - most accurate multilingual model. This is the default model.
  • medium - less accurate but faster than large-v2.
  • base - less accurate but faster than medium.
  • small - less accurate but faster than base.
  1. Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.

  2. Upload archive to virtual appliance.

    scp licensed-models.zip phonexia@<virtual-appliance-ip>:/data/
  3. Unzip archive. Models are extracted to directory per technology:

    unzip licensed-models.zip
  4. Content of the /data/models should look like:

    Terminal
    find /data/models

    Example output:

    /data/models/
    /data/models/enhanced_speech_to_text_built_on_whisper
    /data/models/enhanced_speech_to_text_built_on_whisper/small-1.3.0.model
    /data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.3.0-license.key.txt
    /data/models/enhanced_speech_to_text_built_on_whisper/base-1.3.0.model
    /data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.3.0-license.key.txt
    /data/models/speaker_identification
    /data/models/speaker_identification/xl-5.4.0.model
    /data/models/speaker_identification/speaker_identification-xl-5.4.0-license.key.txt
  5. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.

  6. Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model

  7. Change content of the file key from "large_v2-1.3.0.model" to the file you've just uploaded ("small-1.3.0.model").

  8. Updated file should look like:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
    config:
    model:
    <Not significant lines omitted>
    file: "small-1.3.0.model"
  9. Change the license because you have changed the model. Refer to the licensed models upload and platform configuration step of the installation guide for more information.

  10. Save the file.

  11. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.

  12. Check that the configuration is valid and successfully applied (Configuration file checks).

Load Speech to Text, Time Analysis and Audio Quality Estimation model from data disk

To keep up with the latest version of the application, it is possible to load models from the virtual appliance data volume. To use the image without a bundled model and load existing models from the data volume, configure the instance as follows:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: en
imageTag: 6.2.0-en_us
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
model:
file: "generic-1.2.0.model"
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
model:
file: "generic-1.3.0.model"

By default, models are expected at the path /data/models/speech_to_text/<language>-<version>.model. This folder structure is ensured by unzipping the provided licensed-models.zip archive in the /data/models/ path. If the path to the model is different, or the version of the model does not match the image, it can be specified in the instance config:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: cs
imageTag: 6.6.0-cs_cz
model:
hostPath: /data/models/speech_to_text/cs_cz-6.6.0.model
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
model:
file: "generic-1.2.0.model"
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
model:
file: "generic-1.3.0.model"

Process patented audio codecs with media-conversion

By default media conversion can work only with patent-free audio codecs.

We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.

Pull Media Conversion image directly from Virtual Appliance

This works only if internet (dockerhub) is accessible from the Virtual Appliance.

  1. [Virtual Appliance] Pull media-conversion image to Virtual Appliance:
    k3s ctr image pull docker.io/phonexia/media-conversion:1.0.0
  2. [Virtual Appliance] Export image to data disk to load it automatically:
    k3s ctr image export /data/images/media-conversion-1.0.0.tar docker.io/phonexia/media-conversion:1.0.0
  3. Reconfigure the Media Conversion to use locally downloaded image as mentioned below.

Push Media Conversion image to Virtual Appliance from workstation

This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.

  1. [PC] Pull media-conversion image locally to your workstation:
    docker pull phonexia/media-conversion:1.0.0
  2. [PC] Save Media Conversion image to tar archive:
    docker save --output media-conversion-1.0.0.tar phonexia/media-conversion:1.0.0
  3. [PC] Copy media-conversion-1.0.0.tar file into virtual appliance via ssh or filebrowser to /data/images.
    scp media-conversion-1.0.0.tar phonexia@<IP of virtual appliance>:/data/images/
  4. [Virtual appliance] Restart virtual appliance to load the image or load it manually with:
    k3s ctr image import /data/images/media-conversion-1.0.0.tar
  5. Reconfigure the Media Conversion to use locally downloaded image as mentioned below.

Configure Media Conversion to use pre-downloaded image

Last step is to configure Media Conversion to use image downloaded in previous step.

  1. Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
  2. Locate key .spec.valuesContent.media-conversion.image
  3. Change content of the repository, registry, tag and tagSuffix to
    media-conversion:
    image:
    registry: docker.io
    repository: phonexia/media-conversion
    tag: 1.0.0
    tagSuffix: ""
  4. Updated file should look like:
    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
    name: speech-platform
    namespace: kube-system
    spec:
    valuesContent: |-
    <Not significant lines omitted>
    media-conversion:
    <Not significant lines omitted>
    image:
    registry: docker.io
    repository: phonexia/media-conversion
    tag: 1.0.0
    tagSuffix: ""
  5. Save the file.
  6. The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
  7. Check that the configuration is valid and successfully applied (Configuration file checks).

Custom configuration with cloud-init

Cloud-init is a widely used tool for configuring cloud instances at boot time. And we support it in Virtual Appliance.

It can be used for customizing the Virtual Appliance - to create a user with specific SSH key, install extra packages and so on.

How to Pass Cloud-Init User Configuration to Virtual Appliance

This guide will walk you through the steps required to pass a cloud-init user configuration to a Virtual Appliance.

  1. The first step is to create a user-data file that contains the configuration information you want to pass to the VM. This file is typically written in YAML and may include various configurations, such as creating users, setting up SSH keys, or running commands. Here is an example of a basic user-data file:

    user-data.yaml
    #cloud-config
    users:
    - name: phonexia
    ssh_authorized_keys:
    - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr... your_public_key_here

    packages:
    - htop

    Save this file as user-data.yaml.

  2. Since non-cloud hypervisors like VirtualBox and VMWare does not have a native method to pass cloud-init data, you need to create a "seed" ISO image that contains your user-data.yaml file. Cloud-init will read this data during the virtual machine boot process.

    You can create an ISO image using the cloud-localds command:

    cloud-localds seed.iso user-data.yaml

    This command generates an ISO file named seed.iso containing your user-data.yaml and generated meta-data file.

  3. Attach the ISO Image to the Virtual Appliance VM

    Next, attach the seed.iso file to the VM as a CD-ROM/DVD-ROM. You can do this via the VirtualBox GUI or VMWare vSphere or ESXi Host Client:

  4. Boot the VM

    Cloud-init will automatically detect the attached ISO image and apply the configurations specified in your user-data.yaml file.

  5. Verify Cloud-Init Execution

    Once the VM has booted, you can verify that cloud-init has applied the configuration correctly. Connect to your VM via SSH or the console and check the following:

    1. Check Cloud-Init Status:

      cloud-init status
    2. Check that htop package is installed:

      htop

      This should open htop application.

    3. Check that you can login as phonexia user with ssh key:

      ssh -i <path_to_ssh_private_key> user@<ip of virtual appliance>
    4. Check Cloud-Init Logs: Cloud-init logs its activities in /var/log/cloud-init.log and /var/log/cloud-init-output.log. You can inspect these logs to troubleshoot any issues:

      less /var/log/cloud-init.log
  6. (Optional) Detach the ISO Image

    Usually you no longer need the seed.iso file attached to your VM, you can detach it in a similar way as you attached it.

Uninstall NVIDIA Drivers

Virtual Appliance contains NVIDIA drivers needed for GPU processing. In some cases it might be handy to use different version of the drivers or use different kind of drivers (vGPU) instead. As a first step, current drivers must be uninstalled.

Run following command to uninstall the bundled drivers:

Terminal
dnf module remove nvidia-driver:570-open

Note that GPU processing won't work until new drivers are installed. Installation of the new drivers is out of the scope of this document.