Version: 4.0.1

Adjustments

Following section describes various configuration use cases.

Configuration of technology models and licenses

Changing technology models

In case you use models other than the default, you need to change the values of paths in /data/speech-platform/speech-platform-licenses.yaml file:

<technology>.config.model.file value leading to model, and
<technology>.config.license.key value leading to the license for used model.

Example (change model large_v2-1.0.1 to small-1.0.1 for enhanced-speech-to-text-built-on-whisper technology):

enhanced-speech-to-text-built-on-whisper:
  config:
    model:
      volume:
        hostPath:
          path: /data/models/enhanced_speech_to_text_built_on_whisper
      file: "large_v2-1.0.1.model"
    license:
      useSecret: true
      secret: enhanced-speech-to-text-built-on-whisper-license
      key: "large_v2-1.0.1"

needs to be changed to:

enhanced-speech-to-text-built-on-whisper:
  config:
    model:
      volume:
        hostPath:
          path: /data/models/enhanced_speech_to_text_built_on_whisper
      file: "small-1.0.1.model"
    license:
      useSecret: true
      secret: enhanced-speech-to-text-built-on-whisper-license
      key: "small-1.0.1"

These changes are required for all technologies with licensed models except speech-to-text-phonexia, time-analysis and audio-quality-estimation.

Inspect technologies models

Models are stored inside the /data/models folder, where path to each model is constructed as:

/data/models/<technology_name>/<model_name>-<model_version>.model

Where:

technology_name - is name of the technology, e.g. speaker_identification
model_name - is name of the model, e.g. xl
model_version - is version of a model, e.g. 5.0.0

Imported models can be inspected after the uploading by following command:

Content of the /data/models:

find /data/models

Example output:

/data/models/
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/xl-5.0.0-license.txt
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1.model
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt
/data/models/speech_to_text_phonexia
/data/models/speech_to_text_phonexia/en_us_6-3.62.0-license.txt
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
/data/models/time_analysis
/data/models/time_analysis/generic-3.62.0-license.txt
/data/models/time_analysis/generic-3.62.0.model

Inspect technologies licenses

Licenses are stored in path /data/speech-platform/speech-platform-licenses.yaml. File contains the Kubernetes secrets definition of a licenses which ensures the simple loading of licenses to the application.

Imported licenses can be inspected after the uploading by following command:

Content of the /data/speech-platform folder:

find /data/speech-platform/

Example output:

/data/speech-platform/
/data/speech-platform/speech-platform-licenses.yaml
/data/speech-platform/speech-platform-values.yaml

Kubernetes secret definitions in a file are separated by ---. Each secret contains the contents of the file on the .stringData.license path corresponding to the technology for which the license is meant. For example:

For model of technology speaker_identification with name xl and version 5.0.0, the secret will look like this:

---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
  license: |
    <content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque

Content of a license file (/data/speech-platform/speech-platform-licenses.yaml) can be shown by following command:

Content of the license file:

cat /data/speech-platform/speech-platform-licenses.yaml

Example output:

---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
   license: |
     <content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
name: enhanced-speech-to-text-built-on-whisper-license
namespace: speech-platform
stringData:
   license: |
     <content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt" file>
type: Opaque
.
.
.

Set DNS name for speech platform virtual appliance

Speech platform is accessible on http://<IP_address_of_virtual_appliance>. We recommend to create DNS record to make access more comfortable for users. Consult your DNS provider to get more information how to add corresponding DNS record.

Use HTTPS certificate

Speech platform is also accessible via HTTPS protocol on https://<IP_address_of_virtual_appliance>. If you prefer secure communication you might need to use your own TLS certificate for securing the communication.

To do so, follow this guide:

Prepare the TLS certificate beforehand.
Put certificate private key in file named cert.key.
Put certificate into file named cert.crt.

Create kubernetes secret manifest storing the certificate and private key:

kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml

Copy manifest (resulting file) to /data/ingress-nginx/certificate-server.yaml.
Open text file /data/ingress-nginx/ingress-nginx-values.yaml either directly from inside virtual appliance or via File Browser.
Locate key .spec.valuesContent.controller.extraArgs.default-ssl-certificate
Uncomment the line.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
    <Not significant lines omitted>
      extraArgs:
      <Not significant lines omitted>
        default-ssl-certificate: "ingress-nginx/default-ssl-certificate"

Save the file
Application automatically recognizes that file was updated and redeploys itself with updated configuration.

Extend disks

Disks are extended automatically on VM startup by growfs systemd service when you extend the backing volume/disk in the hypervisor. You can trigger the extension manually by running the script /root/grow-partition-and-filesystems.sh. It grows partition and filesystem for both system and data disks.

Phonexia Speech to Text technology

This section describes configuration specific to Phonexia Speech to Text technology.

Permanent vs onDemand instances

Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.

All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
            .
            .
            .
          - name: cs
            imageTag: 3.62.0-stt-cs_cz_6
            onDemand:
              enabled: true
            .
            .
            .

Delete onDemand key and its subkeys.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
            .
            .
            .
          - name: cs
            imageTag: 3.62.0-stt-cs_cz_6
            .
            .
            .

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Configure languages in Speech to Text Phonexia technology

This technology consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.

By default all languages/instances are enabled in on-demand mode. List of languages:

ar_kw_6
ar_xl_6
bn_6
cs_cz_6
de_de_6
en_us_6
es_6
fa_6
fr_fr_6
hr_hr_6
hu_hu_6
it_it_6
ka_ge_6
kk_kz_6
nl_6
pl_pl_6
ps_6
ru_ru_6
sk_sk_6
sr_rs_6
sv_se_6
tr_tr_6
uk_ua_6
vi_vn_6
zh_cn_6

How to disable all language instances except of cs_cz_6 and en_us_6:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: ark
            imageTag: 3.62.0-stt-ar_kw_6
            onDemand:
              enabled: true
          - name: arx
            imageTag: 3.62.0-stt-ar_xl_6
            onDemand:
              enabled: true
          - name: bn
            imageTag: 3.62.0-stt-bn_6
            onDemand:
              enabled: true
          - name: cs
            imageTag: 3.62.0-stt-cs_cz_6
            onDemand:
              enabled: true
          - name: de
            imageTag: 3.62.0-stt-de_de_6
            onDemand:
              enabled: true
          - name: en
            imageTag: 3.62.0-stt-en_us_6
            onDemand:
              enabled: true
            .
            .
            .
          - name: vi
            imageTag: 3.62.0-stt-vi_vn_6
            onDemand:
              enabled: true
          - name: zh
            imageTag: 3.62.0-stt-zh_cn_6
            onDemand:
              enabled: true

Comment out all the instances except (cs_cz_6 and en_us_6).

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          #- name: ark
          #  imageTag: 3.62.0-stt-ar_kw_6
          #  onDemand:
          #    enabled: true
          #- name: arx
          #  imageTag: 3.62.0-stt-ar_xl_6
          #  onDemand:
          #    enabled: true
          #- name: bn
          #  imageTag: 3.62.0-stt-bn_6
          #  onDemand:
          #    enabled: true
          - name: cs
            imageTag: 3.62.0-stt-cs_cz_6
            onDemand:
              enabled: true
          #- name: de
          #  imageTag: 3.62.0-stt-de_de_6
          #  onDemand:
          #    enabled: true
          - name: en
            imageTag: 3.62.0-stt-en_us_6
            onDemand:
              enabled: true
            .
            .
            .
          #- name: vi
          #  imageTag: 3.62.0-stt-vi_vn_6
          #  onDemand:
          #    enabled: true
          #- name: zh
          #  imageTag: 3.62.0-stt-zh_cn_6
          #  onDemand:
          #    enabled: true

Or you can even delete the instances you are not interested in.

Then updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.62.0-stt-cs_cz_6
            onDemand:
              enabled: true
          - name: en
            imageTag: 3.62.0-stt-en_us_6
            onDemand:
              enabled: true

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Modify replicas for permanent language instances

Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.

caution

We do not recommend increasing replicas for any technology when virtual appliance is running with default resources (4CPU, 32GB memory)!
Note: OnDemand instance has always only one replica.

Find out which language instance you want to configure replicas for.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
Locate key .spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount.
Change the value to desired amount of replicas.
Updated file should look like:

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.62.0-stt-cs_cz_6
            replicaCount: 2

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Modify parallelism for instances

Each instance is able to process only one request at the time, unless the parallelism is overridden. Value of parallelism means the maximum number of requests processed by one instance. Parallelism is set globally for all instances of technology, however each instance can override the value. To override parallelism for speech-to-text-phonexia, time-analysis, or audio-quality-estimation these steps needs to be followed:

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
Find the key, depending on technology (speech-to-text-phonexia, time-analysis, audio-quality-estimation) for which parallelism should be overridden: .spec.valuesContent.<technology>.parallelism
Change the value to desired number of requests processed in parallel

Corresponding section looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    speech-to-text-phonexia:
    <Not significant lines omitted>
      # Global value of parallelism for all instances
      parallelism: 2
      config:
      <Not significant lines omitted>
        instances:
          - name: cs
            imageTag: 3.62.0
          - name: en
            imageTag: 3.62.0
            # Override of parallelism for en instance
            parallelism: 4

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Modify technology replicas

Each technology has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding technologies.

caution

We do not recommend increasing replicas for any technology when virtual appliance is running with default resources (4CPU, 32GB memory)!

Find out which technologies you want to modify replicas - age-estimation, audio-quality-estimation, audio-manipulation-detection, deepfake-detection, emotion-recognition, enhanced-speech-to-text-built-on-whisper, keyword-spotting, language-identification, replay-attack-detection, speaker-diarization, voice-activity-detection, voiceprint-comparison, or voiceprint-extraction.
Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
Locate key .spec.valuesContent.<technology>.replicaCount
Change the value to desired amount of replicas.

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
      replicaCount: 2

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Run technology on GPU

Some of the technologies can run on GPU which increase the processing speed. Technologies that can run on GPU are age-estimation, audio-manipulation-detection, deepfake-detection, emotion-recognition, enhanced-speech-to-text-built-on-whisper, gender-identification, keyword-spotting, language-identification, replay-attack-detection, speaker-diarization, voice-activity-detection, and voiceprint-extraction.

At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi to list all the devices. If device is present and visible to the system, then output should look like:

nvidia-smi -L

Example output:

GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)

If the GPU is visible, then you can reconfigure the technology to use GPU for the processing.

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via File Browser.
Locate technology section .spec.valuesContent.<technology>.
Locate key .spec.valuesContent.<technology>.config.device.

Uncomment the line so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
       # Uncomment this to force technology to run on GPU
       device: cuda

Locate key .spec.valuesContent.<technology>.resources.

Request GPU resources for the processing so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
      # Uncomment this to grant access to GPU on whisper pod
      resources:
        limits:
          nvidia.com/gpu: "1"

Locate key .spec.valuesContent.<technology>.runtimeClassName.

Set runtimeClassName so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
      # Uncomment this to run whisper on GPU
      runtimeClassName: "nvidia"

Locate key .spec.valuesContent.<technology>.updateStrategy.

Set type to Recreate to allow seamless updates so that it looks like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    <technology>:
    <Not significant lines omitted>
      # Uncomment this to allow seamless updates on single GPU machine
      updateStrategy:
        type: Recreate

Example: Updated file for enhanced-speech-to-text-built-on-whisper should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      config:
      <Not significant lines omitted>
        device: cuda

      <Not significant lines omitted>
      resources:
        limits:
          nvidia.com/gpu: "1"

      <Not significant lines omitted>
      runtimeClassName: "nvidia"

      <Not significant lines omitted>
      updateStrategy:
        type: Recreate

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

GPU parallelism settings

This section describes how to control processing parallelism when technology is running on GPU. Following configuration applies only to technologies age-estimation, audio-manipulation-detection, deepfake-detection, enhanced-speech-to-text-built-on-whisper, gender-identification, keyword-spotting, language-identification, replay-attack-detection, voice-activity-detection, and voiceprint-extraction:

    <technology>
      config:
        # -- Parallel tasks per device. GPU only.
        instancesPerDevice: 1
        # -- Index of device to use. GPU only.
        #deviceIndex: 0

There are two configuration options:

instancesPerDevice - Controls how many tasks can be processed by a technology on single GPU in parallel. Higher value means higher GPU utilization (both processor- and memory-wise).
deviceIndex - Controls which GPU card to use in case there are multiple GPU cards. We usually discourage to use this in most cases.

Change model used in a technology

Each technology needs a model to do its job properly. We provide more models for some technologies, for example enhanced-speech-to-text-built-on-whisper. Usually we pre-configure technologies with the most accurate (and slowest model). Typically users use different model to speed up processing in favor of less accurate results.

License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.

Change model in Enhanced Speech to Text Built on Whisper technology

We offer following models for enhanced-speech-to-text-built-on-whisper technology:

large-v3 - next-gen most accurate multilingual model.
large-v2 - most accurate multilingual model. This is the default model.
medium - less accurate but faster than large-v2.
base - less accurate but faster than medium.
small - less accurate but faster than base.

Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.

Upload archive to virtual appliance.

scp licensed-models.zip root@<virtual-appliance-ip>:/data/

Unzip archive. Models are extracted to directory per technology:
```
unzip licensed-models.zip
```

Content of the /data/models should look like:

find /data/models

Example output:

/data/models/
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/small-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.0.0-license.key.txt
/data/models/enhanced_speech_to_text_built_on_whisper/base-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.0.0-license.key.txt
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model
Change content of the file key from "large_v2-1.0.0.model" to file you've just uploaded ("small-1.0.0.model").

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    enhanced-speech-to-text-built-on-whisper:
    <Not significant lines omitted>
      config:
        model:
          <Not significant lines omitted>
          file: "small-1.0.0.model"

Change the license because you have changed the model. Refer to the licensed models upload and platform configuration step of the installation guide for more information.
Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk

To keep up with the latest version of application, load models from virtual appliance volume is possible. For using the image without the model and load existing models from data volume, instance in config file need to be setup as follows:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
      <Not significant lines omitted>
      config:
        <Not significant lines omitted>
        instances:
          . . .
          - name: en
            imageTag: 3.62.0
          . . .
    <Not significant lines omitted>
    time-analysis:
      <Not significant lines omitted>
      config:
        <Not significant lines omitted>
        instances:
          . . .
          - name: tae
            imageTag: 3.62.0
          . . .
    <Not significant lines omitted>
    audio-quality-estimation:
      <Not significant lines omitted>
      config:
        <Not significant lines omitted>
        instances:
          . . .
          - name: aqe
            imageTag: 3.62.0
          . . .

As a default we count with that the model will be located on path /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model. This folder structure is ensured by unzipping provided licensed-models.zip archive in /models/ path. Additionally if the path to the model is different, or the version of model is not matching with the image, it can be specified in instances config as a:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
    <Not significant lines omitted>
    speech-to-text-phonexia:
      <Not significant lines omitted>
      config:
        <Not significant lines omitted>
        instances:
          . . .
          - name: cs
            imageTag: 3.62.0
            model:
              hostPath: /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
          . . .
    <Not significant lines omitted>
    time-analysis:
      <Not significant lines omitted>
      config:
        <Not significant lines omitted>
        instances:
          . . .
          - name: tae
            imageTag: 3.62.0
            model:
              hostPath: /data/models/time_analysis/generic-3.62.0.model
          . . .
    <Not significant lines omitted>
    audio-quality-estimation:
      <Not significant lines omitted>
      config:
        <Not significant lines omitted>
        instances:
          . . .
          - name: aqe
            imageTag: 3.62.0
            model:
              hostPath: /data/models/audio_quality_estimation/generic-3.62.0.model
          . . .

So far model loading from data disk is supported only by the Speech to Text Phonexia and Time Analysis technologies.

Process patented audio codecs with media-conversion

By default media conversion can work only with patent-free audio codecs.

We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.

Pull Media Conversion image directly from Virtual Appliance

This works only if internet (dockerhub) is accessible from the Virtual Appliance.

[Virtual Appliance] Pull media-conversion image to Virtual Appliance:
```
k3s ctr image pull docker.io/phonexia/media-conversion:1.0.0
```

[Virtual Appliance] Export image to data disk to load it automatically:

k3s ctr image export /data/images/media-conversion-1.0.0.tar docker.io/phonexia/media-conversion:1.0.0

Reconfigure the Media Conversion to use locally downloaded image as mentioned below.

Push Media Conversion image to Virtual Appliance from workstation

This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.

[PC] Pull media-conversion image locally to your workstation:
```
docker pull phonexia/media-conversion:1.0.0
```

[PC] Save Media Conversion image to tar archive:

docker save --output media-conversion-1.0.0.tar phonexia/media-conversion:1.0.0

[PC] Copy media-conversion-1.0.0.tar file into virtual appliance via ssh or filebrowser to /data/images.
```
scp media-conversion-1.0.0.tar root@<IP of virtual appliance>:/data/images/
```
[Virtual appliance] Restart virtual appliance to load the image or load it manually with:
```
k3s ctr image import /data/images/media-conversion-1.0.0.tar
```
Reconfigure the Media Conversion to use locally downloaded image as mentioned below.

Configure Media Conversion to use pre-downloaded image

Last step is to configure Media Conversion to use image downloaded in previous step.

Open text file /data/speech-platform/speech-platform-values.yaml either directly from inside virtual appliance or via file browser.
Locate key .spec.valuesContent.media-conversion.image

Change content of the repository, registry, tag and tagSuffix to

media-conversion:
  image:
    registry: docker.io
    repository: phonexia/media-conversion
    tag: 1.0.0
    tagSuffix: ""

Updated file should look like:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: speech-platform
  namespace: kube-system
spec:
  valuesContent: |-
  <Not significant lines omitted>
    media-conversion:
    <Not significant lines omitted>
      image:
        registry: docker.io
        repository: phonexia/media-conversion
        tag: 1.0.0
        tagSuffix: ""

Save the file.
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
Check that the configuration is valid and successfully applied (Configuration file checks).

Custom configuration with cloud-init

Cloud-init is a widely used tool for configuring cloud instances at boot time. And we support it in Virtual Appliance.

It can be used for customizing the Virtual Appliance - to create a user with specific SSH key, install extra packages and so on.

How to Pass Cloud-Init User Configuration to Virtual Appliance

This guide will walk you through the steps required to pass a cloud-init user configuration to a Virtual Appliance.

The first step is to create a user-data file that contains the configuration information you want to pass to the VM. This file is typically written in YAML and may include various configurations, such as creating users, setting up SSH keys, or running commands. Here is an example of a basic user-data file:
```
#cloud-config
users:
  - name: phonexia
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr... your_public_key_here

packages:
  - htop
```
Save this file as user-data.yaml.
Since non-cloud hypervisors like VirtualBox and VMWare does not have a native method to pass cloud-init data, you need to create a "seed" ISO image that contains your user-data.yaml file. Cloud-init will read this data during the virtual machine boot process.

You can create an ISO image using the cloud-localds command:
```
cloud-localds seed.iso user-data.yaml
```
This command generates an ISO file named seed.iso containing your user-data.yaml and generated meta-data file.
Attach the ISO Image to the Virtual Appliance VM

Next, attach the seed.iso file to the VM as a CD-ROM/DVD-ROM. You can do this via the VirtualBox GUI or VMWare vSphere or ESXi Host Client:
Boot the VM

Cloud-init will automatically detect the attached ISO image and apply the configurations specified in your user-data.yaml file.
Verify Cloud-Init Execution

Once the VM has booted, you can verify that cloud-init has applied the configuration correctly. Connect to your VM via SSH or the console and check the following:
1. Check Cloud-Init Status:
```
cloud-init status
```
2. Check that htop package is installed:
```
htop
```
  This should open htop application.
3. Check that you can login as phonexia user with ssh key:
```
ssh -i <path_to_ssh_private_key> user@<ip of virtual appliance>
```
4. Check Cloud-Init Logs: Cloud-init logs its activities in /var/log/cloud-init.log and /var/log/cloud-init-output.log. You can inspect these logs to troubleshoot any issues:
```
less /var/log/cloud-init.log
```
(Optional) Detach the ISO Image

Usually you no longer need the seed.iso file attached to your VM, you can detach it in a similar way as you attached it.

Uninstall NVIDIA Drivers

Virtual Appliance contains NVIDIA drivers needed for GPU processing. In some cases it might be handy to use different version of the drivers or use different kind of drivers (vGPU) instead. As a first step, current drivers must be uninstalled.

Run following command to uninstall the bundled drivers:

dnf module remove nvidia-driver:550

Note that GPU processing won't work until new drivers are installed. Installation of the new drivers is out of the scope of this document.

Configuration of technology models and licenses​

Changing technology models​

Inspect technologies models​

Inspect technologies licenses​

Set DNS name for speech platform virtual appliance​

Use HTTPS certificate​

Extend disks​

Phonexia Speech to Text technology​

Permanent vs onDemand instances​

Configure languages in Speech to Text Phonexia technology​

Modify replicas for permanent language instances​

Modify parallelism for instances​

Modify technology replicas​

Run technology on GPU​

GPU parallelism settings​

Change model used in a technology​

Change model in Enhanced Speech to Text Built on Whisper technology​

Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk​

Process patented audio codecs with media-conversion​

Pull Media Conversion image directly from Virtual Appliance​

Push Media Conversion image to Virtual Appliance from workstation​

Configure Media Conversion to use pre-downloaded image​

Custom configuration with cloud-init​

How to Pass Cloud-Init User Configuration to Virtual Appliance​

Uninstall NVIDIA Drivers​

Configuration of technology models and licenses

Changing technology models

Inspect technologies models

Inspect technologies licenses

Set DNS name for speech platform virtual appliance

Use HTTPS certificate

Extend disks

Phonexia Speech to Text technology

Permanent vs onDemand instances

Configure languages in Speech to Text Phonexia technology

Modify replicas for permanent language instances

Modify parallelism for instances

Modify technology replicas

Run technology on GPU

GPU parallelism settings

Change model used in a technology

Change model in Enhanced Speech to Text Built on Whisper technology

Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk

Process patented audio codecs with media-conversion

Pull Media Conversion image directly from Virtual Appliance

Push Media Conversion image to Virtual Appliance from workstation

Configure Media Conversion to use pre-downloaded image

Custom configuration with cloud-init

How to Pass Cloud-Init User Configuration to Virtual Appliance

Uninstall NVIDIA Drivers