Workers' Configuration
A worker is a working thread that performs the actual processing of files or real-time streams in the Speech Engine. This article helps you understand Speech Engine workers and provides information on how to configure them for optimal performance and server utilization.
Starting from SPE 3.51,
new defaults
in settings/phxspe.properties
automatically configure workers according to
local conditions (physical CPU cores, configured technologies) to ensure optimal
performance and server utilization.
These new defaults make the content of the article below obsolete; however,
we are keeping it here for those who still want to fine-tune the configuration
manually.
The default worker configuration in settings/phxspe.properties
is shown below:
8 workers for file processing and 8 workers for real-time stream processing.
These numbers represent the maximum number of simultaneously running tasks.
# Multithread settings
server.n_workers = 8
server.n_realtime_workers = 8
Requests for additional file-processing tasks are put in a queue and processed according to their order and priorities. Requests for additional stream-processing tasks are refused with HTTP status 403 because the real-time nature of stream processing does not allow for queuing.
File processing can process data faster than real-time, which allows it to utilize 100% of a physical CPU core. This means that for file processing technologies, the number of workers should be set to the number of physical CPU cores in the server, and there is no point in configuring more workers.
Stream processing can only process data at real-time speed at maximum, no one can speak faster than real-time, so a single physical CPU core can actually process multiple real-time tasks simultaneously, depending on how much faster than real-time a particular technology is (and also how much speech the audio contains). This means that for stream processing technologies, it makes sense to configure a higher number of workers than there are physical CPU cores in the server.
Czech STT on stream is approximately 4 times faster than real-time, meaning 1 CPU core can process 4 real-time streams simultaneously. So, a server with 8 CPU cores running only the STT stream can be configured as follows:
- Keep 1 core dedicated to the operating system and SPE.
- The remaining 7 cores can handle 28 real-time workers (7 cores × 4 streams per core).
Therefore, the real-time workers setting should be
server.n_realtime_workers = 28
.
The number of initialized STT_STREAM technology instances (configured via
phxadmin --configure-tech
) should also be set to 28
. There is no point in
initializing more than that since there won't be more workers available, meaning
no more than 28 tasks can run simultaneously.
Sitting on top of these numbers is the number of slots for a particular technology in a license.
For example, a license with 16 slots for STT will not allow the initialization of more than 16 instances of STT, regardless of the configured number of workers or technology instances.
To ensure optimal and full utilization of your license and server, make sure that all these numbers and settings align with and reflect your actual server hardware configuration.
The term "CPU core" means a physical CPU core. Hyper-Threading does not bring any benefits here as our processing is highly optimized and can really fully utilize the physical core.