Adjustments
Following section describes various configuration use cases.
Configuration of technology models and licenses
Changing technology models
In case you use models other than the default, you need to change the values of
paths in /data/speech-platform/speech-platform-licenses.yaml
file:
<technology>.config.model.file
value leading to model, and<technology>.config.license.key
value leading to the license for used model.
Example (change model large_v2-1.0.1
to small-1.0.1
for
enhanced-speech-to-text-built-on-whisper
technology):
enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "large_v2-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "large_v2-1.0.1"
needs to be changed to:
enhanced-speech-to-text-built-on-whisper:
config:
model:
volume:
hostPath:
path: /data/models/enhanced_speech_to_text_built_on_whisper
file: "small-1.0.1.model"
license:
useSecret: true
secret: enhanced-speech-to-text-built-on-whisper-license
key: "small-1.0.1"
These changes are required for all technologies with licensed models except
speech-to-text-phonexia
, time-analysis
and audio-quality-estimation
.
Inspect technologies models
Models are stored inside the /data/models
folder, where path to each model is
constructed as:
/data/models/<technology_name>/<model_name>-<model_version>.model
Where:
- technology_name - is name of the technology, e.g.
speaker_identification
- model_name - is name of the model, e.g.
xl
- model_version - is version of a model, e.g.
5.0.0
Imported models can be inspected after the uploading by following command:
-
Content of the
/data/models
:find /data/models
Example output:
/data/models/
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/xl-5.0.0-license.txt
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1.model
/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt
/data/models/speech_to_text_phonexia
/data/models/speech_to_text_phonexia/en_us_6-3.62.0-license.txt
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
/data/models/time_analysis
/data/models/time_analysis/generic-3.62.0-license.txt
/data/models/time_analysis/generic-3.62.0.model
Inspect technologies licenses
Licenses are stored in path
/data/speech-platform/speech-platform-licenses.yaml
. File contains the
Kubernetes secrets definition of a licenses which ensures the simple loading of
licenses to the application.
Imported licenses can be inspected after the uploading by following command:
-
Content of the
/data/speech-platform
folder:find /data/speech-platform/
Example output:
/data/speech-platform/
/data/speech-platform/speech-platform-licenses.yaml
/data/speech-platform/speech-platform-values.yaml
Kubernetes secret definitions in a file are separated by ---
. Each secret
contains the contents of the file on the .stringData.license
path
corresponding to the technology for which the license is meant. For example:
- For model of technology
speaker_identification
with namexl
and version5.0.0
, the secret will look like this:
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
Content of a license file
(/data/speech-platform/speech-platform-licenses.yaml
) can be shown by
following command:
-
Content of the license file:
cat /data/speech-platform/speech-platform-licenses.yaml
Example output:
---
apiVersion: v1
kind: Secret
metadata:
name: speaker-identification-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/speaker_identification/xl-5.0.0-license.txt" file>
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
name: enhanced-speech-to-text-built-on-whisper-license
namespace: speech-platform
stringData:
license: |
<content of "/data/models/enhanced_speech_to_text_built_on_whisper/large_v2-1.0.1-license.txt" file>
type: Opaque
.
.
.
Set DNS name for speech platform virtual appliance
Speech platform is accessible on http://<IP_address_of_virtual_appliance>
. We
recommend to create DNS record to make access more comfortable for users.
Consult your DNS provider to get more information how to add corresponding DNS
record.
Use HTTPS certificate
Speech platform is also accessible via HTTPS protocol on
https://<IP_address_of_virtual_appliance>
. If you prefer secure communication
you might need to use your own TLS certificate for securing the communication.
To do so, follow this guide:
-
Prepare the TLS certificate beforehand.
-
Put certificate private key in file named
cert.key
. -
Put certificate into file named
cert.crt
. -
Create kubernetes secret manifest storing the certificate and private key:
kubectl create -n ingress-nginx secret tls default-ssl-certificate --key cert.key --cert cert.crt -o yaml --dry-run > /tmp/certificate-secret.yaml
-
Copy manifest (resulting file) to
/data/ingress-nginx/certificate-server.yaml
. -
Open text file
/data/ingress-nginx/ingress-nginx-values.yaml
either directly from inside virtual appliance or via File Browser. -
Locate key
.spec.valuesContent.controller.extraArgs.default-ssl-certificate
-
Uncomment the line.
-
Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
<Not significant lines omitted>
extraArgs:
<Not significant lines omitted>
default-ssl-certificate: "ingress-nginx/default-ssl-certificate" -
Save the file
-
Application automatically recognizes that file was updated and redeploys itself with updated configuration.
Extend disks
Disks are extended automatically on VM startup by growfs
systemd service when
you extend the backing volume/disk in the hypervisor. You can trigger the
extension manually by running the script
/root/grow-partition-and-filesystems.sh
. It grows partition and filesystem for
both system and data disks.
Phonexia Speech to Text technology
This section describes configuration specific to Phonexia Speech to Text technology.
Permanent vs onDemand instances
Permanent instance is started and running (and consuming resources) all the time. OnDemand instance is started only when corresponding task is queued. Instance is stopped when all tasks were processed.
All instances are onDemand by default. Any instance can be reconfigured to be permanent. Use following guide to reconfigure instance from onDemand to permanent one:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
.
.
. - Delete onDemand key and its subkeys.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
.
.
.
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
.
.
. - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Configuration file checks).
Configure languages in Speech to Text Phonexia technology
This technology consists of multiple instances. Each instance corresponds to a single language. All instances are listed in the configuration file.
By default all languages/instances are enabled in on-demand mode. List of languages:
- ar_kw_6
- ar_xl_6
- bn_6
- cs_cz_6
- de_de_6
- en_us_6
- es_6
- fa_6
- fr_fr_6
- hr_hr_6
- hu_hu_6
- it_it_6
- ka_ge_6
- kk_kz_6
- nl_6
- pl_pl_6
- ps_6
- ru_ru_6
- sk_sk_6
- sr_rs_6
- sv_se_6
- tr_tr_6
- uk_ua_6
- vi_vn_6
- zh_cn_6
How to disable all language instances except of cs_cz_6 and en_us_6:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances
. - Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: ark
imageTag: 3.62.0-stt-ar_kw_6
onDemand:
enabled: true
- name: arx
imageTag: 3.62.0-stt-ar_xl_6
onDemand:
enabled: true
- name: bn
imageTag: 3.62.0-stt-bn_6
onDemand:
enabled: true
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
- name: de
imageTag: 3.62.0-stt-de_de_6
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
- name: vi
imageTag: 3.62.0-stt-vi_vn_6
onDemand:
enabled: true
- name: zh
imageTag: 3.62.0-stt-zh_cn_6
onDemand:
enabled: true - Comment out all the instances except (cs_cz_6 and en_us_6).
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
#- name: ark
# imageTag: 3.62.0-stt-ar_kw_6
# onDemand:
# enabled: true
#- name: arx
# imageTag: 3.62.0-stt-ar_xl_6
# onDemand:
# enabled: true
#- name: bn
# imageTag: 3.62.0-stt-bn_6
# onDemand:
# enabled: true
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
#- name: de
# imageTag: 3.62.0-stt-de_de_6
# onDemand:
# enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true
.
.
.
#- name: vi
# imageTag: 3.62.0-stt-vi_vn_6
# onDemand:
# enabled: true
#- name: zh
# imageTag: 3.62.0-stt-zh_cn_6
# onDemand:
# enabled: true - Or you can even delete the instances you are not interested in.
- Then updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
onDemand:
enabled: true
- name: en
imageTag: 3.62.0-stt-en_us_6
onDemand:
enabled: true - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Configuration file checks).
Modify replicas for permanent language instances
Each language instance has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel you have to increase replicas for corresponding language instance.
We do not recommend increasing replicas for any technology when virtual
appliance is running with default resources (4CPU, 32GB memory)!
Note: OnDemand instance has always only one replica.
- Find out which language instance you want to configure replicas for.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.speech-to-text-phonexia.config.instances.<language instance>.replicaCount
. - Change the value to desired amount of replicas.
- Updated file should look like:
- Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0-stt-cs_cz_6
replicaCount: 2 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Configuration file checks).
Modify parallelism for instances
Each instance is able to process only one request at the time, unless the
parallelism is overridden. Value of parallelism means the maximum number of
requests processed by one instance. Parallelism is set globally for all
instances of technology, however each instance can override the value. To
override parallelism for speech-to-text-phonexia
, time-analysis
, or
audio-quality-estimation
these steps needs to be followed:
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Find the key, depending on technology (
speech-to-text-phonexia
,time-analysis
,audio-quality-estimation
) for which parallelism should be overridden:.spec.valuesContent.<technology>.parallelism
- Change the value to desired number of requests processed in parallel
- Corresponding section looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
# Global value of parallelism for all instances
parallelism: 2
config:
<Not significant lines omitted>
instances:
- name: cs
imageTag: 3.62.0
- name: en
imageTag: 3.62.0
# Override of parallelism for en instance
parallelism: 4 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Configuration file checks).
Modify technology replicas
Each technology has only one replica by default. This means that only one request/audiofile can be processed at once. To process multiple requests/audiofiles in parallel, you have to increase replicas for corresponding technologies.
We do not recommend increasing replicas for any technology when virtual appliance is running with default resources (4CPU, 32GB memory)!
- Find out which technologies you want to modify replicas -
age-estimation
,audio-quality-estimation
,audio-manipulation-detection
,deepfake-detection
,emotion-recognition
,enhanced-speech-to-text-built-on-whisper
,keyword-spotting
,language-identification
,replay-attack-detection
,speaker-diarization
,voice-activity-detection
,voiceprint-comparison
, orvoiceprint-extraction
. - Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. - Locate key
.spec.valuesContent.<technology>.replicaCount
- Change the value to desired amount of replicas.
- Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<technology>:
<Not significant lines omitted>
replicaCount: 2 - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Configuration file checks).
Run technology on GPU
Some of the technologies can run on GPU which increase the processing speed.
Technologies that can run on GPU are age-estimation
,
audio-manipulation-detection
, deepfake-detection
, emotion-recognition
,
enhanced-speech-to-text-built-on-whisper
, gender-identification
,
keyword-spotting
, language-identification
, replay-attack-detection
,
speaker-diarization
, voice-activity-detection
, and voiceprint-extraction
.
At first make sure virtual appliance can see the GPU device(s). Use nvidia-smi
to list all the devices. If device is present and visible to the system, then
output should look like:
nvidia-smi -L
Example output:
GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-1fb957fa-a6fc-55db-e76c-394e9a67b7f5)
If the GPU is visible, then you can reconfigure the technology to use GPU for the processing.
-
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via File Browser. -
Locate technology section
.spec.valuesContent.<technology>
. -
Locate key
.spec.valuesContent.<technology>.config.device
. -
Uncomment the line so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<technology>:
<Not significant lines omitted>
config:
<Not significant lines omitted>
# Uncomment this to force technology to run on GPU
device: cuda -
Locate key
.spec.valuesContent.<technology>.resources
. -
Request GPU resources for the processing so that it looks like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<technology>:
<Not significant lines omitted>
# Uncomment this to grant access to GPU on whisper pod
resources:
limits:
nvidia.com/gpu: "1" -
Locate key
.spec.valuesContent.<technology>.runtimeClassName
. -
Set
runtimeClassName
so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<technology>:
<Not significant lines omitted>
# Uncomment this to run whisper on GPU
runtimeClassName: "nvidia" -
Locate key
.spec.valuesContent.<technology>.updateStrategy
. -
Set
type
toRecreate
to allow seamless updates so that it looks like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
<technology>:
<Not significant lines omitted>
# Uncomment this to allow seamless updates on single GPU machine
updateStrategy:
type: Recreate -
Example: Updated file for
enhanced-speech-to-text-built-on-whisper
should look like:apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
<Not significant lines omitted>
device: cuda
<Not significant lines omitted>
resources:
limits:
nvidia.com/gpu: "1"
<Not significant lines omitted>
runtimeClassName: "nvidia"
<Not significant lines omitted>
updateStrategy:
type: Recreate -
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
-
Check that the configuration is valid and successfully applied (Configuration file checks).
GPU parallelism settings
This section describes how to control processing parallelism when technology is
running on GPU. Following configuration applies only to technologies
age-estimation
, audio-manipulation-detection
, deepfake-detection
,
enhanced-speech-to-text-built-on-whisper
, gender-identification
,
keyword-spotting
, language-identification
, replay-attack-detection
,
voice-activity-detection
, and voiceprint-extraction
:
<technology>
config:
# -- Parallel tasks per device. GPU only.
instancesPerDevice: 1
# -- Index of device to use. GPU only.
#deviceIndex: 0
There are two configuration options:
instancesPerDevice
- Controls how many tasks can be processed by a technology on single GPU in parallel. Higher value means higher GPU utilization (both processor- and memory-wise).deviceIndex
- Controls which GPU card to use in case there are multiple GPU cards. We usually discourage to use this in most cases.
Change model used in a technology
Each technology needs a model to do its job properly. We provide more models for
some technologies, for example enhanced-speech-to-text-built-on-whisper
.
Usually we pre-configure technologies with the most accurate (and slowest
model). Typically users use different model to speed up processing in favor of
less accurate results.
License you have received with the virtual appliance is valid only for default model. If you change the model, you have to change the license as well.
Change model in Enhanced Speech to Text Built on Whisper technology
We offer following models for enhanced-speech-to-text-built-on-whisper technology:
large-v3
- next-gen most accurate multilingual model.large-v2
- most accurate multilingual model. This is the default model.medium
- less accurate but faster thanlarge-v2
.base
- less accurate but faster thanmedium
.small
- less accurate but faster thanbase
.
-
Ask Phonexia to provide you desired model and license. You will receive link(s) which results into zip archive (zip file) when downloaded.
-
Upload archive to virtual appliance.
scp licensed-models.zip root@<virtual-appliance-ip>:/data/
-
Unzip archive. Models are extracted to directory per technology:
unzip licensed-models.zip
-
Content of the
/data/models
should look like:find /data/models
Example output:
/data/models/
/data/models/enhanced_speech_to_text_built_on_whisper
/data/models/enhanced_speech_to_text_built_on_whisper/small-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-base-1.0.0-license.key.txt
/data/models/enhanced_speech_to_text_built_on_whisper/base-1.0.0.model
/data/models/enhanced_speech_to_text_built_on_whisper/enhanced_speech_to_text_built_on_whisper-small-1.0.0-license.key.txt
/data/models/speaker_identification
/data/models/speaker_identification/xl-5.0.0.model
/data/models/speaker_identification/speaker_identification-xl-5.0.0-license.key.txt -
Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. -
Locate key
.spec.valuesContent.enhanced-speech-to-text-built-on-whisper.config.model
-
Change content of the
file
key from"large_v2-1.0.0.model"
to file you've just uploaded ("small-1.0.0.model"
). -
Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
enhanced-speech-to-text-built-on-whisper:
<Not significant lines omitted>
config:
model:
<Not significant lines omitted>
file: "small-1.0.0.model" -
Change the license because you have changed the model. Refer to the licensed models upload and platform configuration step of the installation guide for more information.
-
Save the file.
-
The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
-
Check that the configuration is valid and successfully applied (Configuration file checks).
Load Speech to Text Phonexia, Time Analysis and Audio Quality Estimation model from data disk
To keep up with the latest version of application, load models from virtual appliance volume is possible. For using the image without the model and load existing models from data volume, instance in config file need to be setup as follows:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: en
imageTag: 3.62.0
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
. . .
As a default we count with that the model will be located on path
/data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. This folder
structure is ensured by unzipping provided licensed-models.zip
archive in
/models/
path. Additionally if the path to the model is different, or the
version of model is not matching with the image, it can be specified in
instances config as a:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
speech-to-text-phonexia:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: cs
imageTag: 3.62.0
model:
hostPath: /data/models/speech_to_text_phonexia/en_us_6-3.62.0.model
. . .
<Not significant lines omitted>
time-analysis:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: tae
imageTag: 3.62.0
model:
hostPath: /data/models/time_analysis/generic-3.62.0.model
. . .
<Not significant lines omitted>
audio-quality-estimation:
<Not significant lines omitted>
config:
<Not significant lines omitted>
instances:
. . .
- name: aqe
imageTag: 3.62.0
model:
hostPath: /data/models/audio_quality_estimation/generic-3.62.0.model
. . .
So far model loading from data disk is supported only by the Speech to Text Phonexia and Time Analysis technologies.
Process patented audio codecs with media-conversion
By default media conversion can work only with patent-free audio codecs.
We cannot include and distribute patented codecs with virtual appliance. If you need to process audiofiles encoded with patented codecs, you have to use different version of media-conversion. Media-conversion service image is located on dockerhub.
Pull Media Conversion image directly from Virtual Appliance
This works only if internet (dockerhub) is accessible from the Virtual Appliance.
- [Virtual Appliance] Pull media-conversion image to Virtual Appliance:
k3s ctr image pull docker.io/phonexia/media-conversion:1.0.0
- [Virtual Appliance] Export image to data disk to load it automatically:
k3s ctr image export /data/images/media-conversion-1.0.0.tar docker.io/phonexia/media-conversion:1.0.0
- Reconfigure the Media Conversion to use locally downloaded image as mentioned below.
Push Media Conversion image to Virtual Appliance from workstation
This approach is needed if your deployment is completely offline and access to internet from virtual appliance is forbidden.
- [PC] Pull media-conversion image locally to your workstation:
docker pull phonexia/media-conversion:1.0.0
- [PC] Save Media Conversion image to tar archive:
docker save --output media-conversion-1.0.0.tar phonexia/media-conversion:1.0.0
- [PC] Copy
media-conversion-1.0.0.tar
file into virtual appliance via ssh or filebrowser to/data/images
.scp media-conversion-1.0.0.tar root@<IP of virtual appliance>:/data/images/
- [Virtual appliance] Restart virtual appliance to load the image or load it
manually with:
k3s ctr image import /data/images/media-conversion-1.0.0.tar
- Reconfigure the Media Conversion to use locally downloaded image as mentioned below.
Configure Media Conversion to use pre-downloaded image
Last step is to configure Media Conversion to use image downloaded in previous step.
- Open text file
/data/speech-platform/speech-platform-values.yaml
either directly from inside virtual appliance or via file browser. - Locate key
.spec.valuesContent.media-conversion.image
- Change content of the
repository
,registry
,tag
andtagSuffix
tomedia-conversion:
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Updated file should look like:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: speech-platform
namespace: kube-system
spec:
valuesContent: |-
<Not significant lines omitted>
media-conversion:
<Not significant lines omitted>
image:
registry: docker.io
repository: phonexia/media-conversion
tag: 1.0.0
tagSuffix: "" - Save the file.
- The application automatically recognizes when the file is updated and redeploys itself with the updated configuration.
- Check that the configuration is valid and successfully applied (Configuration file checks).
Custom configuration with cloud-init
Cloud-init is a widely used tool for configuring cloud instances at boot time. And we support it in Virtual Appliance.
It can be used for customizing the Virtual Appliance - to create a user with specific SSH key, install extra packages and so on.
How to Pass Cloud-Init User Configuration to Virtual Appliance
This guide will walk you through the steps required to pass a cloud-init user configuration to a Virtual Appliance.
-
The first step is to create a user-data file that contains the configuration information you want to pass to the VM. This file is typically written in YAML and may include various configurations, such as creating users, setting up SSH keys, or running commands. Here is an example of a basic
user-data
file:#cloud-config
users:
- name: phonexia
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr... your_public_key_here
packages:
- htopSave this file as
user-data.yaml
. -
Since non-cloud hypervisors like VirtualBox and VMWare does not have a native method to pass cloud-init data, you need to create a "seed" ISO image that contains your
user-data.yaml
file. Cloud-init will read this data during the virtual machine boot process.You can create an ISO image using the
cloud-localds
command:cloud-localds seed.iso user-data.yaml
This command generates an ISO file named
seed.iso
containing youruser-data.yaml
and generated meta-data file. -
Attach the ISO Image to the Virtual Appliance VM
Next, attach the
seed.iso
file to the VM as a CD-ROM/DVD-ROM. You can do this via the VirtualBox GUI or VMWare vSphere or ESXi Host Client: -
Boot the VM
Cloud-init will automatically detect the attached ISO image and apply the configurations specified in your
user-data.yaml
file. -
Verify Cloud-Init Execution
Once the VM has booted, you can verify that cloud-init has applied the configuration correctly. Connect to your VM via SSH or the console and check the following:
-
Check Cloud-Init Status:
cloud-init status
-
Check that htop package is installed:
htop
This should open htop application.
-
Check that you can login as phonexia user with ssh key:
ssh -i <path_to_ssh_private_key> user@<ip of virtual appliance>
-
Check Cloud-Init Logs: Cloud-init logs its activities in
/var/log/cloud-init.log
and/var/log/cloud-init-output.log
. You can inspect these logs to troubleshoot any issues:less /var/log/cloud-init.log
-
-
(Optional) Detach the ISO Image
Usually you no longer need the
seed.iso
file attached to your VM, you can detach it in a similar way as you attached it.
Uninstall NVIDIA Drivers
Virtual Appliance contains NVIDIA drivers needed for GPU processing. In some cases it might be handy to use different version of the drivers or use different kind of drivers (vGPU) instead. As a first step, current drivers must be uninstalled.
Run following command to uninstall the bundled drivers:
dnf module remove nvidia-driver:550
Note that GPU processing won't work until new drivers are installed. Installation of the new drivers is out of the scope of this document.