Skip to main content

Tips for Optimizing SPE Performance

The following are several recommendations for maximizing SPE performance in terms of speed, throughput, and hardware utilization.

Disable results caching

By default, SPE is configured to cache file processing results in its database. This prevents reprocessing of files in the event of repeated requests for the processing of the same file by the same technology, model, etc.

If you are retrieving the processing results from the API and storing them in your own application, you may not require the results to be cached in the SPE database.
In this case, results caching can be disabled to:

  • Avoid unnecessary database operations, thereby reducing disk activity.
  • Prevent database growth due to data that may never be deleted (see the section on deleting/unregistering files below).

To disable results caching, set server.db.save_results = false in the SPE configuration file.

For additional details, refer to the articles Speech Engine Database and the article section serverdbsave_results in Speech Engine Configuration File

Utilize file registration instead of upload

Depending on your workflow and the design of your application, consider changing how you transfer recordings to SPE. Instead of uploading files via the POST /audiofile endpoint, you can copy the files directly to SPE storage and register them using the POST /audiofile/registration endpoint.
Copying files at the filesystem level can be more efficient, faster, and conserve machine resources.

When using this method, ensure that access rights are properly configured. This means the process copying the files to SPE storage must create the files with access permissions that allow the SPE process to read them.

Ensure proper deletion/unregistration of files

Regardless of how you transfer your recordings to SPE (as discussed in the previous section), always ensure that recordings are properly removed when they are no longer needed.
Effective housekeeping keeps SPE storage free of unnecessary files and prevents excessive growth of the SPE database.

Maintaining a reasonable database size prevents processing slowdowns due to database operations. Specifically, when using SQLite, any write operation creates a copy of the entire database file. If the file size is several megabytes or even a few gigabytes, this can cause delays—even on fast SSDs—resulting in processing delays of several seconds or more.

  • Use DELETE /audiofile to:
    • Remove audio file records from the database.
    • Delete all related cached processing results (if caching is enabled).
    • Physically delete the file from SPE storage.
  • Alternatively, use DELETE /audiofile/registration to:
    • Remove audio file records from the database.
    • Delete all related cached processing results (if caching is enabled).
    • Note: This operation does not physically delete the file. An external process should eventually remove it to avoid accumulating unnecessary files in SPE storage.
tip

If you process real-time streams and occasionally save the incoming audio to a file (e.g., for troubleshooting purposes), remember that the audio file is created in SPE storage and is registered in the SPE database.

Prefer MariaDB over SQLite

For higher processing loads, it is generally recommended to use MariaDB as the SPE database. Unlike SQLite, MariaDB is a high-performance, scalable database designed to handle very high loads efficiently.
MariaDB's physical storage is optimized for high performance and does not experience delays associated with managing large files, as seen with SQLite (refer to the previous section).

Optimize SQLite with performance enhancements

If you choose not to use MariaDB and prefer to continue using SQLite, consider applying the following performance enhancements to improve overall SPE performance:

  • Create a RAM disk and store the database file on it.
    While the performance gain over modern SSDs may vary, this approach can still be beneficial. The database file path is set using the server.db.sqlite.data_source option in the SPE configuration file. For more details, refer to the article section serverdbsqlitedata_source in Speech Engine Configuration file.
  • Set the temp_store pragma to MEMORY. This forces temporary tables and indices to be created in memory rather than on disk. More details are available here.
  • Set the synchronous pragma to NORMAL or OFF. This allows SQLite to be less strict in synchronizing changes to the database file, speeding up database operations at the risk of potential database corruption in the event of an SPE or operating system crash. More details are available here.

Process only relevant parts of audio

To enhance the performance of technologies like Language Identification, Speaker Identification, Gender Identification, or Age Estimation, consider processing only relevant portions of the input audio.
For instance, instead of extracting a voiceprint or identifying a language from an entire 6-minute phone call, you can use just 1 minute or even 30 seconds of speech. This is typically sufficient for obtaining reliable results and significantly accelerates processing.

The simplest method is to use the from_time and to_time parameters in REST API calls. Note that these parameters specify the duration of the audio, not the speech content. Generally, selecting a 1-minute section of audio should contain enough speech for the technologies to function reliably.

A more advanced approach involves using Voice Activity Detection (VAD) to identify segments of audio containing speech. This technology is extremely fast and does not add significant processing overhead.
After obtaining the VAD results, you can apply your logic to select the appropriate audio segment. Depending on your use case, this could be "the longest segment of speech within the second third of the call," or a similar criterion.