Frequently Asked Questions
What hypervisors are supported by the Speech Platform 4 Virtual Appliance (VA)?
Hypervisors for which we provide installation guides are:
With GPU passthrough support:
- VMware ESXi
- Proxmox/QEMU/libvirt
- Microsoft Hyper-V (only server editions)
Without GPU passthrough support:
- VMware Workstation Pro
- VirtualBox (suitable for testing and small scale deployment, not for large scale production)
Cloud platforms where our Partners have successfully deployed the VA (no installation guides available):
- Google Cloud Platform, AWS and Microsoft Azure will work and support GPU passthrough, but we do not provide support for deployment on these platforms, only for internal operation of the Virtual Appliance.
In general, please be aware that any hypervisor which supports GPUs needs to be a Type 1 hypervisor or a GPU instance in the case of cloud platforms.
How do I deploy the Virtual Appliance on any supported hypervisor?
Please refer to our installation guides for various hypervisors here: Installation guides.
What are best GPUs for running GPU-supported technologies?
We support only NVIDIA GPUs as they are CUDA capable - for a complete list please refer to CUDA GPU Compute Capability - every GPU stated in this list is suitable for usage with our GPU-powered technologies - the higher the Compute Capability the better. Please refer to our guide on System Requirements.
I want to transcribe 200 hours per day through Enhanced Speech to Text Built on Whisper, what GPU should I use?
We are unable to provide an exact GPU model due to the fact that this is heavily dependent on various factors, such as audio quality, audio codec, percent of net speech in the audio, the language of the recording, etc. Please refer to our performance measurement page here: Enhanced Speech to Text Built on Whisper performance measurements.
Our rough estimate is that NVIDIA RTX 4000 Ada/NVIDIA T4 is capable of handling 20 FTRT = 20 hours of audio per processing hour for the English language. As English is the fastest language, other languages will have worse performance. We strongly recommend testing on the target data before switching the environment to production.
For reference purposes, we have also tested that NVIDIA RTX 4060 is capable of handling 10 FTRT for English language. Please note that consumer-grade GPUs are not meant for continuous processing and using them as such might void warranty on such GPUs.
For more information about the FTRT metric, please refer to our explanation page.
I want to enable GPU-passthrough on my hypervisor, how?
This is out scope for Phonexia and is your responsibility. We have a guide on how to enable GPU-passthrough on VMware ESXi and Proxmox, but your mileage may vary. Should you be interested, please find those guides in the installation guides page. Please note that these guides on GPU passthrough are provided as is without any guarantees that it will work on the target environment.
Something is not working as it should - what should I do?
Before contacting Phonexia Consulting and Support Team, please refer to the Troubleshooting guide in our Documentation - there's a host of information on various scenarios which can happen during operation of the Virtual Appliance.
Should the issue persist even after undertaking the steps mentioned in the troubleshooting guide, please generate diagnostic data following the Getting Diagnostics Data from Virtual Appliance guide and provide them to the Phonexia Consulting and Support Team together with an exact description of the issue.
How do I find out if the GPU is enabled and the Virtual Appliance can see it?
Use this guide:
Adjustments,
section Run Technology on GPU, specifically the part with the nvidia-smi
command.
If the quality of output of any technology is bad, how do I fix it?
Most common reasons for bad output is bad input quality - too compressed audio, too low bitrate, too much noise, non-speech segments, present reverberations, one silent channel, etc. Please refer to article Input Audio Quality to find out what our technologies expect in the matter of audio quality.