Skip to main content

Version 3.6.0

· 2 min read
  • Added Authenticity Verification technology with Deepfake Detection subtechnology (high-level description) in REST API and GUI.
  • Added Gender Identification technology (high-level description) in REST API (still only preview in GUI). The endpoints are differentiated based on the input type, which can be either a media file or a list of voiceprints from Voiceprint Extraction.
  • Added Denoiser technology (high-level description) preview in GUI.
  • Added Emotion Recognition technology (high-level description) preview in GUI.
  • Updated Enhanced Speech to Text Built on Whisper model (high-level description) with a query parameter for word-level segmentation that also improves the overall accuracy of the timestamps. Because this behaviour is resource heavy, it is turned-off by default.
  • Added settings for parallel threads and multiple instances for GPU support for Language Identification and Voice Activity Detection.
  • Configuration and administration changes:
    • The Virtual Appliance startup process now displays system messages again for more clarity.
    • Improved detection of "system is ready" state during startup process
    • When licensed-models.zip package is uploaded via the Filebrowser GUI, it's automatically unpacked after upload.
    • New script configure-speech-platform.sh for the Speech Platform configuration, with more functionality. Use configure-speech-platform.sh --auto-configure to automatically configure the system according to models and licenses uploaded to /data/ folder. The enable-technologies.sh script is now obsolete and will be removed in next release.
    • Configuration YAML file is now much shorter, simpler and more comprehensive.
    • Turning on GPU support in configuration file is now easier, all GPU images are now included, it's not needed to download/configure them separately.
Included Components
  • Audio Quality Estimation 3.62.0
  • Deepfake Detection 1.1.0
  • Enhanced Speech to Text Built on Whisper 1.8.1
  • Language Identification 1.6.1
  • Speaker Diarization 1.5.1
  • Speech to Text Phonexia 6th Generation 3.62.0
  • Time Analysis of Speech 3.62.0
  • Voice Activity Detection 1.0.2
  • Voiceprint Comparison 1.3.0
  • Voiceprint Extraction 1.5.3

Version 3.5.0

· One min read
  • Added consumption counting and GUI capacities indicator.
  • Configuration and administration changes:
    • Virtual Appliance system console now displays the prompt only after the internal system is started. The system console is kept blank during that time. It may take some additional time for the GUI to become fully ready.
    • All technologies are now disabled by default after first start.
    • Added enable-technologies.sh script for enabling technologies according to uploaded models and licenses.
    • It's now possible to change UI limits for all technologies in the configuration file.

New available models with improved Voice Activity Detection configuration:

  • Enhanced Speech to Text Built on Whisper: model 1.1.0
  • Speaker Identification: model 5.2.0
  • Language Identification: model 5.3.0
  • Speaker Diarization: model 5.1.0
  • Voice Activity Detection: model 5.3.0
Included Components
  • Audio Quality Estimation 3.62.0
  • Enhanced Speech to Text Built on Whisper 1.7.1
  • Language Identification 1.6.1
  • Speaker Diarization 1.5.1
  • Speech to Text Phonexia 6th Generation 3.62.0
  • Time Analysis of Speech 3.62.0
  • Voice Activity Detection 1.0.2
  • Voiceprint Comparison 1.3.0
  • Voiceprint Extraction 1.5.3

Version 3.4.0

· One min read
  • Added Audio Quality Estimation technology (high-level description) in REST API.
  • Voice Activity Detection technology (high-level description) is available in both REST API and GUI.
  • Speech Translation technology is now fully working in GUI.
  • Speaker Diarization technology (high-level description) is now fully working in GUI.
  • Updated Speech to Text Phonexia with the ability to use Preferred phrases.
  • Updated the Speaker Identification model to xl-5.1.0, which is capable of carrying out automatic adaptation to various input audio sources (YouTube, Skype, WhatsApp, VoLTE, AMBE).
Included Components
  • Audio Quality Estimation 3.62.0
  • Enhanced Speech to Text Built on Whisper 1.7.0
  • Language Identification 1.5.0
  • Speaker Diarization 1.4.1
  • Speech to Text Phonexia 6th Generation 3.62.0
  • Time Analysis of Speech 3.62.0
  • Voice Activity Detection 1.0.1
  • Voiceprint Comparison 1.3.0
  • Voiceprint Extraction 1.5.2

Version 3.3.0

· One min read
  • Added Speech Translation technology preview in GUI.
  • Added Speaker Diarization technology in (high-level description) REST API (still only preview in GUI).
  • Added option to speed up Enhanced Speech to Text Built on Whisper via beamSize parameter in Virtual Appliance configuration file - smaller beamSize means faster processing (up to ~30% with large_v2 model and beamSize=1) at the expense of slightly lower accuracy.
  • Speech to Text Phonexia and Time Analysis of Speech technologies updated to version 3.62.0.
  • Configuration and administration changes:
    • Added support for importing Virtual Appliance to Microsoft Hyper-V.
    • Both system and data disk now automatically resize according to size set in virtualization software.
    • Customers can now use cloud-init with Virtual Appliance.
    • Added diagnostic script for collecting logs for troubleshooting.
Included Components
  • Enhanced Speech to Text Built on Whisper 1.5.0
  • Language Identification 1.3.1
  • Speaker Diarization 1.3.0
  • Speech to Text Phonexia 6th Generation 3.62.0
  • Time Analysis of Speech 3.62.0
  • Voiceprint Comparison 1.1.0
  • Voiceprint Extraction 1.4.0

Version 3.2.0

· One min read
  • Language Identification technology now fully working in GUI.
  • Added Gender Identification technology (high-level description) preview in GUI.
  • Added possibility to share a GPU among multiple technologies, to better utilize the hardware resources.
Included Components
  • Enhanced Speech to Text Built on Whisper 1.4.0
  • Language Identification 1.2.0
  • Speech to Text Phonexia 6th Generation 3.61.0
  • Time Analysis of Speech 3.61.0
  • Voiceprint Comparison 1.1.0
  • Voiceprint Extraction 1.4.0

Version 3.1.0

· One min read
  • New and significantly easier way to install or update technology licenses - they are now loaded from a separate YAML file. Just unzip the licensed models package to the data disk and the system loads the licenses automatically.

    ⚠️ NOTE: The new license format is not compatible with previous versions, i.e. licenses used in earlier Virtual Appliance versions cannot be used in this version and a new license needs to be obtained from Phonexia.

  • Added Language Identification technology preview in GUI.
  • Added Speaker Diarization technology ([high-level description(/products/speech-platform-4/technologies/speaker-diarization)) preview in GUI.
  • Added possibility to run Voiceprint Extraction on GPU to significantly boost the extraction performance.
  • Virtual Appliance configuration changes are now automatically applied to corresponding components (no restarts needed anymore).
  • Lower memory consumption during processing due to internal components optimizations.
Included Components
  • Enhanced Speech to Text Built on Whisper 1.4.0
  • Speech to Text Phonexia 6th Generation 3.61.0
  • Time Analysis of Speech 3.61.0
  • Voiceprint Comparison 1.1.0
  • Voiceprint Extraction 1.4.0

Version 3.0.0

· One min read
  • The Speech to Text Whisper Enhanced technology has been renamed to Enhanced Speech to Text Built on Whisper and a new Language switching feature was added. This feature identifies the predominant language spoken within each thirty-second interval of audio and the identified language is then utilized for transcribing that particular section.

  • For Speech to Text Phonexia and Time Analysis of Speech technologies it's now possible to configure the number of tasks to be processed in parallel. It is done using the paralelism parameter in the corresponding sections of Virtual Appliance configuration file.

Included Components
  • Enhanced Speech to Text Built on Whisper 1.2.2
  • Speech to Text Phonexia 6th Generation 3.61.0
  • Time Analysis of Speech 3.61.0
  • Voiceprint Comparison 1.0.0
  • Voiceprint Extraction 1.2.0

Version 2.1.0

· One min read
  • Maintenance release with only configuration and administration related changes:
    • Models for Speech to Text Phonexia and Time Analysis of Speech technologies. are now loaded from data disk, not from image.
    • Speech to Text Phonexia and Time Analysis of Speech technologies updated to version 3.61.0.
    • Added extra environment variables for Speech to Text Whisper Enhanced.
    • Added maximum upload file size specification for filebrowser.
    • Moved Prometheus storage to data disk.
Included Components
  • Voiceprint Extraction 1.2.0
  • Voiceprint Comparison 1.0.0
  • Speech to Text Whisper Enhanced 1.1.0
  • Speech to Text Phonexia 6th Generation 3.61.0
  • Time Analysis of Speech 3.61.0

Version 2.0.0

· One min read
  • Added Time Analysis of Speech technology (high-level description), available via REST API only (no GUI).
  • Configuration and administration changes:
    • Added options to change tmpdir volume for speech-platform API and media-conversion.
    • Added options to configure UI limits.
    • Added option to change API log level.
    • Models are now stored on data disk separately for each microservice.
Included Components
  • Speech to Text Phonexia 6th Generation 3.60.1
  • Speech to Text Whisper Enhanced 1.1.0
  • Time Analysis of Speech 3.60.1
  • Voiceprint Comparison 1.0.0
  • Voiceprint Extraction 1.2.0

Version 1.1.0

· One min read
  • Initial release with Speaker Identification ([high-level description(/products/speech-platform-4/technologies/speaker-identification)) and Speech To Text (high-level description) technologies available via REST API and in GUI. The Speech to Text Whisper Enhanced supports auto-detection of the language.
Included Components
  • Speech to Text Phonexia 6th Generation 3.60.1
  • Speech to Text Whisper Enhanced 1.0.1
  • Voiceprint Comparison 1.0.0
  • Voiceprint Extraction 1.0.0