Skip to main content

Version 3.6.0

· 2 min read
  • Added Authenticity Verification technology with Deepfake Detection subtechnology (high-level description) in REST API and GUI.
  • Added Gender Identification technology (high-level description) in REST API (still only preview in GUI). The endpoints are differentiated based on the input type, which can be either a media file or a list of voiceprints from Voiceprint Extraction.
  • Added Denoiser technology (high-level description) preview in GUI.
  • Added Emotion Recognition technology (high-level description) preview in GUI.
  • Updated Enhanced Speech to Text Built on Whisper model (high-level description) with a query parameter for word-level segmentation that also improves the overall accuracy of the timestamps. Because this behaviour is resource heavy, it is turned-off by default.
  • Added settings for parallel threads and multiple instances for GPU support for Language Identification and Voice Activity Detection.
  • Configuration and administration changes:
    • The Virtual Appliance startup process now displays system messages again for more clarity.
    • Improved detection of "system is ready" state during startup process
    • When licensed-models.zip package is uploaded via the Filebrowser GUI, it's automatically unpacked after upload.
    • New script configure-speech-platform.sh for the Speech Platform configuration, with more functionality. Use configure-speech-platform.sh --auto-configure to automatically configure the system according to models and licenses uploaded to /data/ folder. The enable-technologies.sh script is now obsolete and will be removed in next release.
    • Configuration YAML file is now much shorter, simpler and more comprehensive.
    • Turning on GPU support in configuration file is now easier, all GPU images are now included, it's not needed to download/configure them separately.
Included Components
  • Audio Quality Estimation 3.62.0
  • Deepfake Detection 1.1.0
  • Enhanced Speech to Text Built on Whisper 1.8.1
  • Language Identification 1.6.1
  • Speaker Diarization 1.5.1
  • Speech to Text Phonexia 6th Generation 3.62.0
  • Time Analysis of Speech 3.62.0
  • Voice Activity Detection 1.0.2
  • Voiceprint Comparison 1.3.0
  • Voiceprint Extraction 1.5.3