Version 3.6.0
- Added Authenticity Verification technology with Deepfake Detection subtechnology (high-level description) in REST API and GUI.
- Added Gender Identification technology (high-level description) in REST API (still only preview in GUI). The endpoints are differentiated based on the input type, which can be either a media file or a list of voiceprints from Voiceprint Extraction.
- Added Denoiser technology (high-level description) preview in GUI.
- Added Emotion Recognition technology (high-level description) preview in GUI.
- Updated Enhanced Speech to Text Built on Whisper model (high-level description) with a query parameter for word-level segmentation that also improves the overall accuracy of the timestamps. Because this behaviour is resource heavy, it is turned-off by default.
- Added settings for parallel threads and multiple instances for GPU support for Language Identification and Voice Activity Detection.
- Configuration and administration changes:
- The Virtual Appliance startup process now displays system messages again for more clarity.
- Improved detection of "system is ready" state during startup process
- When
licensed-models.zip
package is uploaded via the Filebrowser GUI, it's automatically unpacked after upload. - New script
configure-speech-platform.sh
for the Speech Platform configuration, with more functionality. Useconfigure-speech-platform.sh --auto-configure
to automatically configure the system according to models and licenses uploaded to/data/
folder. Theenable-technologies.sh
script is now obsolete and will be removed in next release. - Configuration YAML file is now much shorter, simpler and more comprehensive.
- Turning on GPU support in configuration file is now easier, all GPU images are now included, it's not needed to download/configure them separately.
Included Components
- Audio Quality Estimation 3.62.0
- Deepfake Detection 1.1.0
- Enhanced Speech to Text Built on Whisper 1.8.1
- Language Identification 1.6.1
- Speaker Diarization 1.5.1
- Speech to Text Phonexia 6th Generation 3.62.0
- Time Analysis of Speech 3.62.0
- Voice Activity Detection 1.0.2
- Voiceprint Comparison 1.3.0
- Voiceprint Extraction 1.5.3