Tips for Optimizing SPE Performance
The following are several recommendations for maximizing SPE performance in terms of speed, throughput, and hardware utilization.
Disable results caching
By default, SPE is configured to cache file processing results in its database. This prevents reprocessing of files in the event of repeated requests for the processing of the same file by the same technology, model, etc.
If you are retrieving the processing results from the API and storing them in
your own application, you may not require the results to be cached in the SPE
database.
In this case, results caching can be disabled to:
- Avoid unnecessary database operations, thereby reducing disk activity.
- Prevent database growth due to data that may never be deleted (see the section on deleting/unregistering files below).
To disable results caching, set server.db.save_results = false
in the SPE
configuration file.
For additional details, refer to the articles
Speech Engine Database
and the article section serverdbsave_results
in
Speech Engine Configuration File
Utilize file registration instead of upload
Depending on your workflow and the design of your application, consider changing
how you transfer recordings to SPE. Instead of uploading files via the
POST /audiofile
endpoint, you can copy the files directly to
SPE storage
and register them using the POST /audiofile/registration
endpoint.
Copying files at the filesystem level can be more efficient, faster, and
conserve machine resources.
When using this method, ensure that access rights are properly configured. This means the process copying the files to SPE storage must create the files with access permissions that allow the SPE process to read them.
Ensure proper deletion/unregistration of files
Regardless of how you transfer your recordings to SPE (as discussed in the
previous section), always ensure that recordings are properly removed when they
are no longer needed.
Effective housekeeping keeps SPE storage free of unnecessary files and prevents
excessive growth of the SPE database.
Maintaining a reasonable database size prevents processing slowdowns due to database operations. Specifically, when using SQLite, any write operation creates a copy of the entire database file. If the file size is several megabytes or even a few gigabytes, this can cause delays—even on fast SSDs—resulting in processing delays of several seconds or more.
- Use
DELETE /audiofile
to:- Remove audio file records from the database.
- Delete all related cached processing results (if caching is enabled).
- Physically delete the file from SPE storage.
- Alternatively, use
DELETE /audiofile/registration
to:- Remove audio file records from the database.
- Delete all related cached processing results (if caching is enabled).
- Note: This operation does not physically delete the file. An external process should eventually remove it to avoid accumulating unnecessary files in SPE storage.
If you process real-time streams and occasionally save the incoming audio to a file (e.g., for troubleshooting purposes), remember that the audio file is created in SPE storage and is registered in the SPE database.
Prefer MariaDB over SQLite
For higher processing loads, it is generally recommended to use MariaDB as the
SPE database. Unlike SQLite, MariaDB is a high-performance, scalable database
designed to handle very high loads efficiently.
MariaDB's physical storage is optimized for high performance and does not
experience delays associated with managing large files, as seen with SQLite
(refer to the previous section).
Optimize SQLite with performance enhancements
If you choose not to use MariaDB and prefer to continue using SQLite, consider applying the following performance enhancements to improve overall SPE performance:
- Create a RAM disk and store the database file on it.
While the performance gain over modern SSDs may vary, this approach can still be beneficial. The database file path is set using theserver.db.sqlite.data_source
option in the SPE configuration file. For more details, refer to the article sectionserverdbsqlitedata_source
in Speech Engine Configuration file. - Set the
temp_store
pragma toMEMORY
. This forces temporary tables and indices to be created in memory rather than on disk. More details are available here. - Set the
synchronous
pragma toNORMAL
orOFF
. This allows SQLite to be less strict in synchronizing changes to the database file, speeding up database operations at the risk of potential database corruption in the event of an SPE or operating system crash. More details are available here.
Process only relevant parts of audio
To enhance the performance of technologies like Language Identification, Speaker
Identification, Gender Identification, or Age Estimation, consider processing
only relevant portions of the input audio.
For instance, instead of extracting a voiceprint or identifying a language from
an entire 6-minute phone call, you can use just 1 minute or even 30 seconds of
speech. This is typically sufficient for obtaining reliable results and
significantly accelerates processing.
The simplest method is to use the from_time
and to_time
parameters in REST
API calls. Note that these parameters specify the duration of the audio, not
the speech content. Generally, selecting a 1-minute section of audio should
contain enough speech for the technologies to function reliably.
A more advanced approach involves using Voice Activity Detection (VAD) to
identify segments of audio containing speech. This technology is extremely fast
and does not add significant processing overhead.
After obtaining the VAD results, you can apply your logic to select the
appropriate audio segment. Depending on your use case, this could be "the
longest segment of speech within the second third of the call," or a similar
criterion.