Version 3.6.0

February 17, 2025 · 2 min read

Added Authenticity Verification technology with Deepfake Detection subtechnology (high-level description) in REST API and GUI.
Added Gender Identification technology (high-level description) in REST API (still only preview in GUI). The endpoints are differentiated based on the input type, which can be either a media file or a list of voiceprints from Voiceprint Extraction.
Added Denoiser technology (high-level description) preview in GUI.
Added Emotion Recognition technology (high-level description) preview in GUI.
Updated Enhanced Speech to Text Built on Whisper model (high-level description) with a query parameter for word-level segmentation that also improves the overall accuracy of the timestamps. Because this behaviour is resource heavy, it is turned-off by default.
Added settings for parallel threads and multiple instances for GPU support for Language Identification and Voice Activity Detection.
Configuration and administration changes:
- The Virtual Appliance startup process now displays system messages again for more clarity.
- Improved detection of "system is ready" state during startup process
- When licensed-models.zip package is uploaded via the Filebrowser GUI, it's automatically unpacked after upload.
- New script configure-speech-platform.sh for the Speech Platform configuration, with more functionality. Use configure-speech-platform.sh --auto-configure to automatically configure the system according to models and licenses uploaded to /data/ folder. The enable-technologies.sh script is now obsolete and will be removed in next release.
- Configuration YAML file is now much shorter, simpler and more comprehensive.
- Turning on GPU support in configuration file is now easier, all GPU images are now included, it's not needed to download/configure them separately.

Included Components