Sizing and Performance
Sizing of the system
The selection of speech technologies and the number of instances per technology
that are instantiated when starting the SPE is configured by the phxadmin
utility, and the specification is saved by default to the file
{spe_root}/settings/technologies.xml
. The details about the XML structure are
described in another article. Please note that the technology configuration can
be saved to another place. For example, it can be shared between multiple
instances of SPE using network storage, virtual storage, etc. This is useful for
orchestrator-driven architectures using virtualization, dockerization, or other
types of managed instantiation.
# Path to technologies configuration
technologies.configuration = ${application.dir}settings/technologies.xml
Performance tuning
The server.n_workers directive determines how many instances of any initiated technology will run in parallel. Phonexia technologies can fully utilize one physical core per instance of any technology! It has the following consequences:
- Hyperthreading has no significant effect for good CPU utilization.
- Virtual machines should be c configured carefully with performance considerations in mind.
- RAM speed is more important than CPU clock frequency.
- Because large amounts of data (statistical models) are loaded to RAM, the pipe bandwidth between RAM and CPU is important.
- L3 cache (shared between CPU cores) is a key factor.
For optimal CPU utilization on a single physical server, configure your technologies.xml to the following number of instances:
- SQE:
<#_cpu_cores>/4
- VAD:
<#_cpu_cores>/2
- any other technology:
<#_cpu_cores>
(Note: Ensure your license is also configured properly. Contact our Sales department for assistance with hot-load evaluation tests. The production license will be configured with our support.)
Optimal RAM recommendation:
- 4 cores: 16 GB RAM
- 8 cores: 32 GB RAM
- 16 cores: 64 GB RAM
The server.n_workers
directive is crucial for optimizing the system's
performance. The example below is optimal for a CPU with 8 physical cores:
# Multithread settings
server.n_workers = 8
You can set up system environment variables to determine how many physical cores are available. You can determine the number of physical cores by executing the following commands in a Linux console or in the MS Windows Command Prompt:
- Linux:
grep "^core id" /proc/cpuinfo | sort -u | wc -l
- Windows:
wmic cpu get NumberOfCores /value
Assign the value obtained from the above command to an environment variable
(let's say, P_CORES) and use it like this:
server.n_workers = ${system.env.P_CORES}
.
Realtime workers must be configured separately to enable RTP or HTTP stream processing. Essentially, it is the same as setting normal workers:
server.n_realtime_workers = 0
The SPE maintains its own queue of task requests waiting to be processed, and the following directive determines the allowed number of tasks in the queue. You can fine-tune each SPE instance's load to compute the optimal usage of that instance. This directive should also be considered when configuring user rights settings for server resource utilization (see SPE documentation).
# Sets limit for number of pending operations.
server.n_task_limit = 1000
The results from finished tasks are held in memory for fast retrieval for a specified number of seconds. If database use is disabled, the results are lost after the timeout period. If database use is enabled (default), the results can be retrieved from the database as long as the audio recording is stored in the SPE storage.
# Timeout auto remove finished task
server.finished_task_timeout = 60
Finale and large scale deployment
The license settings are another part of the configuration prepared for shared storage. It can be used in various ways. Please ask for consultancy if you need more details.
# Sets path to license file
# server.license = ${application.dir}license.dat
This part of the configuration should be self-explanatory:
# Limit for maximum upload file size
# server.upload_max_filesize = 10MB
# Limit for maximum upload file metadata size
# server.max_metadata_size = 1KB
# Set maximum length of TCP connection queue.
# if the queue is full, new requests are rejected
# Default is 64
server.tcp.queue = 64
# Set maximum threads for TCP connections.
# The threads fetch TCP connections from TCP queue and process them
# Default is 16
server.tcp.threads = 16
# Enable HTTP stream subsystem
stream.http.enable = true
# Set timeout for HTTP stream in seconds.
# If stream doesn't receive any data for given time, then stream is closed.
stream.http.timeout = 30
# Enable RTP stream subsystem
stream.rtp.enable = true
# IP address for create rtp sessions
stream.rtp.bind_ip = 0.0.0.0
# Sets starting port for creating RTP sessions
stream.rtp.min_port = 10000 stream.rtp.max_port = 11000
# Number of max opened sessions in one moment
stream.rtp.stream_limit = 10
# Set timeout for RTP socket in seconds.
# If RTP socket don't receive any data for a given time, then RTP socket is closed.
stream.rtp.timeout = 10
Large scale - sharing core technologies
The last configuration directive discussed here prepares the architecture for
large-scale topology and deployment. By default, the core technology files are
located in the default location {spe_root}/bsapi/*
. The total size of all
files inside the BSAPI directories depends on the set of technologies used, and
and a total size exceeding 3GB is quite common. This can be problematic in
architectures designed for fast and large-scale deployment using source pools
for virtualization environment, spread across multiple collocation centers, or
working with geographical distribution, and so on.
By using the bsapi.path
directive (and with the help of environment variables)
you can set up a more efficient deployment method. Simply un-comment this
directive and configure it correctly.
# Set path to bsapi directory
# bsapi.path = ${application.dir}bsapi