Version 4.1.0
This release focuses on security fixes, further improvements in the Speech
Platform startup process, and some useful features and improvements in the
GUI.
It also adds Denoiser technology, intended for removing noise and background
sounds from recordings for better speech comprehensibility for a human listener
(using the denoised audio for further processing by speech technologies is NOT
recommended, as it usually makes the results worse).
- Added Denoiser technology (high-level description) in REST API and GUI
- Configuration and administration changes
- Updated internal ingress-nginx controller to resolve a security issue CVE-2025-1974
- Improved startup checks to detect some common network issues and give the users resolution hints in console welcome screen
- All technologies now support starting on-demand, allowing users more flexible configuration with regard to RAM consumption
- Frontend limits configuration is now common for all technologies
- Documentation of the internal limits settings was significantly enhanced for more clarity, including schema and description of the media processing flow
REST API changes
- REST API endpoint
/api/system/status
now exposes capacities and current usage of system storages - Minor improvements in OpenAPI schema
GUI (web application) changes
- Tiles on the home page can be now reordered
- Added "Send to" feature, which allows to send the result of a processing (e.g.
by Voice Activity Detection) to other technology
(only Speech To Text is supported as target for now, other technologies can be added if there is a demand) - Minor Keyword Spotting improvements:
- Added a column with the number of found keywords
- When keyword is played, corresponding table row is highlighted (note that since keywords tend to be rather short, the row actually rather shortly flashes)
- Adjusted Deepfake Detection score scale range to reflect the latest model results
- Adjusted terminology for "score", "confidence" and "probability" in GUI and export headers (CSV, Excel) in Deepfake Detection, Emotion Detection, Gender Identification, Language Identification, Keyword Spotting, Speech To Text and Speech Translation technologies
Included Components
- Age Estimation 1.1.0
- Audio Manipulation Detection 1.0.0
- Audio Quality Estimation 3.62.0
- Deepfake Detection 2.2.0
- Denoiser 1.1.0
- Emotion Recognition 1.2.0
- Enhanced Speech to Text Built on Whisper 1.10.0
- Gender Identification 1.4.0
- Keyword Spotting 1.1.0
- Language Identification 1.7.0
- Replay Attack Detection 1.0.0
- Speaker Diarization 1.6.0
- Speech to Text Phonexia 6th Generation 3.62.0
- Time Analysis of Speech 3.62.0
- Voice Activity Detection 1.2.0
- Voiceprint Comparison 1.5.0
- Voiceprint Extraction 1.6.0