Skip to main content

Version: 4.0.2

Virtual Appliance Changelog

4.0.0 (2025-06-25)

Added

Deploy Age Estimation
Deploy Replay Attack Detection
Deploy Audio Manipulation Detection
Deploy Keyword Spotting
Allow to set UI limits for Age Estimation
Allow to set UI limits for Authenticity Verification technologies
Allow to set UI limits for Keyword Spotting
Add image watcher loading images from /data/images

Changed

Add microservices which does not support old licensing
Refactoring and logging improvements for configure-speech-platform.sh
Increase default file size limit to 100MB on api and frontend
Set the local registry to be reachable only via localhost

Fixed

Fix issue with the virtual appliance clock synchronization
Fix k3s logs
Fix filebrowser runs speech-platform configuration script multiple times

Removed

Redundant images check from speech-platform configuration script

3.7.0 (2025-04-07)

Added

Auto unzip and configuration on licensed-models*.zip insertion to /data/ folder
Deploy Gender Identification
Deploy Emotion Recognition
Allow to specify profile for voice-activity-detection
Output more information from diag-report

Removed

Remove enable-technologies.sh. Replaced by configure-speech-platform.sh.

3.6.0 (2025-02-17)

Added

Deploy Deep Fake Detection
Add systemd service for checking Speech Platform status on boot
Add output of networking to diagnostics report
Add output of capacities and consumptions endpoints to diagnostics report
Add output about time and helm charts into diagnostics report
Define GPU parameters for Language Identification and Voice Activity Detection
Create configuration script
Add values backup directory to the diagnostic report archive
Deploy and use local container registry
Store all gpu images by default
Unzip models package after upload to Filebrowser
Use phonexia branding in filebrowser

Changed

Upgrade to Rocky Linux 9.5
Upgrade nvidia driver to 550.127.08
Change the way how custom images are uploaded
test-kubernetes.sh script reconcile state faster
Shorten values file

3.5.0 (2024-12-19)

Added

Rotate k3s logs on weekly basis
Allow to set UI limits for Language Identification
Allow to set UI limits for Speech Translation
Allow to set UI limits for Voice Activity Detection
Allow to set UI limits for Speaker Diarization
Allow to set UI limits for Audio Quality Estimation

Changed

Set components debug level to info
Use dedicated error log for k3s
Bump expected model for Enhanced Speech to Text Built on Whisper to 1.1.0
Bump expected model for Speaker Identification to 5.2.0
Bump expected model for Language Identification to 5.3.0
Bump expected model for Speaker Diarization 5.1.0
Bump expected model for Voice Activity Detection to 3.1.0

3.4.1 (2024-12-17)

Changed

Disable all microservices by default

3.4.0 (2024-11-19)

Added

Add new default phonexia user
Add Audio Quality Estimation technology microservice
Add Voice Activity Detection technology microservice
Print more information to system console

Changed

Discourage disabling mandatory components
Disable rate-limit on api, billing and rest-api-gateway components
Use xl-5.1.0 model for speaker-identification
Use debug log level for speaker-identification, language-identification, speaker-diarization and media-conversion by default
Improve run-diag-report.sh script output and result archive name
Upgrade ingress-nginx helm chart to version 4.11.2
Upgrade keda helm chart to version 2.15.1
Upgrade dcgm-exporter helm chart to version 3.6.0
Upgrade kube-prometheus-stack helm chart to version 65.0.0
Upgrade nginx helm chart to version 18.2.2
Upgrade reloader helm chart to version 1.1.0
Upgrade k3s to 1.30.5
Make diag report script user friendly

3.3.0 (2024-09-16)

Added

Add diagnostics script
Deploy configuration service
Automatically resize both system and data disks
Deploy speaker diarization
Support for Hyper-V hypervisor

Changed

Cloud-init for customer use is no longer disabled
Speech engine images bumped from 3.61.0 to 3.62.0
Enable ingress metrics for ingress without hostname

3.2.0 (2024-08-20)

Added

Add language-identification technology
Enable GPU sharing

Changed

Deploy nvidia-device-plugin from helm chart
Deploy nvidia-device-plugin to nvidia-device-plugin namespace instead of gpu namespace

Removed

Do not include models for speaker-identification and enhanced-speech-to-text-built-on-whisper by default on data disk

3.1.0 (2024-07-30)

Added

Deploy billing-related components
Loading the licenses from secrets instead of values file
Reloader which reloads the deployment every time its secret is edited
Install yq
Support running voiceprint-extraction on GPU

Changed

Edit the way how secrets are loaded in Speech-to-Text-Phonexia and Time-Analysis values
Add comment explaining why disabling media-conversion is usually not a good idea
Update the Nvidia drivers to version 550
Build in AWS instead of local infrastructure
Use cloud-init for configuration
Root (/) partition is merged with var partition (/var)

3.0.0 (2024-05-15)

Added

Enable to configure parallelism in speech-to-text-phonexia and time-analysis

Changed

Rename speech-to-text-whisper-enhanced to enhanced-speech-to-text-built-on-whisper
Upgrade Rocky Linux version to 9.4

Fixed

Fix language code for Levantine arabic

2.1.0 (2024-04-18)

Added

Introduce extra environment variables for speech-to-text-whisper-enhanced
Introduce max upload file size specification for filebrowser

Changed

Increase inotify limits
Move prometheus storage to data disk
Admin-related backends are extracted to separate ingressAdmin configuration
Time-analysis reconfigured as onDemand instance by default
Speech engine images bumped from 3.60.1 to 3.61.0
Load models for speech-to-text-phonexia and time-analysis from data disk by default
Default value for maxFileSize UI limits set to 5MB instead of 5MiB
Time-analysis is started as onDemand instance by default

Fixed

Blacklist nouveau driver

2.0.0 (2024-03-29)

Added

Allow to change api log level
Allow to configure UI limits
Add time-analysis subchart
Add options to change tmpdir volume for speech-platform API and media-conversion

Changed

(Breaking change) Rename speech-engine subchart to speech-to-text-phonexia subchart
Store models on data disk for each microservice separately
Use free version of media-conversion by default

Fixed

Start nvidia-persistenced only when nvidia driver is loaded

1.1.0 (2024-02-19)

Added

Enable startup probe for speech-to-text-whisper-enhanced
Automatically start nvidia-persistenced daemon
Add comments about api storage resources

Changed

Use app frontend as landing page
Expose admin console to /admin uri
Increase max data disk size to 20GB
Disable GSP firmware in nvidia drivers

Fixed

Calculate speech-to-text-whisper-enhanced capacity on GPU properly

Removed

Remove swap partition

1.0.0

Initial release