Speech Engine Database
SPE database serves multiple purposes:
- stores SPE internal data
- stores various information about SPE entities created by SPE user
- audio files metadata
- speaker models and their voiceprints
- speaker groups and their voiceprints
- calibration sets
- keyword lists
- language packs
- audio source profiles
- stores cached processing results (ON by default, can be set in SPE configuration file)
- optionally also stores SPE log data (MariaDB / MySQL only, OFF by default, can be set in SPE configuration file)
To cache or not to cache?
It depends on the particular use case and on the design of your app, whether
using the built-in results caching would be beneficial or not.
In general, the built-in results caching can be useful when creating simple
lightweight app. When building a complex voice-processing system, using multiple
SPE processing units, load balancing, etc., it should be generally better to
disable the built-in results caching and create your own caching layer, tailored
specifically for your particular system architecture and/or processing workflow.
Cached data persistence
Cached processing results are kept in the database as long as the audio file
exists. When the audio file is deleted from SPE storage, all related
information, metadata, and processing results are deleted from the database.
Stream processing data is not cached at all.
If data privacy and security are a concern, disabling the built-in results caching ensures that processing results are returned only via the REST API response and are not kept in the database at all.
Supported databases
SPE supports SQLite and MariaDB 10.x (SPE 3.46+) MySQL 5.x (SPE up
to 3.45) database engine.
The database engine is configured in phxspe.properties
SPE configuration file
– see the Database section of
the SPE configuration file article for more details.
SQLite
SQLite is the out-of-the-box SPE default database type.
By its nature, SQLite is intended mainly as lightweight storage for
configuration data. Still, it can handle also the results caching of course...
unless we talk about real mass-processing.
When using results caching AND processing like hundreds of thousands or millions
of audio files per day, the SQLite's locking mechanism (simple global database
lock) can become a performance bottleneck. Choosing a higher-performance MariaDB
(MySQL) database is the way to go.
When SPE is configured to use SQLite database, the database is created and
initialized automatically by running phxadmin
or phxspe
.
SQLite database is typically created during first-time SPE setup when
configuring technologies using phxadmin
– it's created silently behind the
scenes, using values from phxspe.properties
configuration file (location, file
name) and default SPE configuration (users, roles, etc.).
SQLite database updates are also handled automatically by SPE – from time to time, as we add new features or improve existing functionality, the database internal structure may get updated in newer SPE versions. When using SQLite, if a new SPE version detects that the database needs an update, it's done fully automatically behind the scenes.
If Speech Engine is used together with Phonexia Browser in so-called “embedded”
mode (see details about "embedded SPE" mode in Browser manual), Phonexia Browser
creates its own separate SPE configuration file and the SQLite database file is
located in SPE home
directory and named phxserver.sqlite
.
This might be important in certain scenarios, e.g. when
registering LID language pack
using phxadmin
– you need to point the phxadmin
to the appropriate SPE
configuration file in order to make the changes to the correct database.
MariaDB / MySQL
MariaDB / MySQL database is a high-performance alternative to SQLite.
As opposed to SQLite, MariaDB (or MySQL) uses fine-grained locking mechanisms,
resulting in higher performance in environments with high concurrency – e.g. in
mass-processing deployments with multiple SPE processing units and results
caching in the central database, etc.
When SPE is configured to use MariaDB / MySQL database, the database
must be created and initialized manually first using SQL scripts provided in
SPE distribution package.
Similarly, when updating SPE to a newer version, any eventual required MariaDB
/ MySQL database updates must be done manually as part of the manual SPE
update process using the SQL scripts.
See more details in SPE database scripts
article.
Database size
The database is not being vacuumed/optimized/shrank. However, the database space
freed by deleted data is re-used by newly added data.
Therefore it is normal that database size grows over time to a certain extent.
Assuming that a) the daily input load is more-or-less the same, and b) that
processed/unneeded audio files get removed from SPE storage, the database would
grow to a certain size and then stay at that size, as the number/size of new DB
records get in balance with the number/size of deleted DB records.
In any case, if the database gets oversized e.g. by one-off processing of unusual amount of audio, it can still be manually vacuumed/optimized/shrinked using commands appropriate for the database type you are using.
Excessive database growth may be a sign of missing housekeeping in your
workflow design.
For example, you may not be deleting uploaded audio files after processing using
the appropriate REST API call. Another example would be that you are not
unregister files after processing (if using the files registering technique
instead of uploading the audio files – see the
Speech engine home directory article). This
makes the file information AND the cached processing results to be kept in the
database.
Or, you may be saving stream data to a file, but not deleting the created stream
audio files using the REST API call when/if they are not needed anymore,
stacking up the files metadata in the database.
Database structure and content
SPE database consists of tables and views with rest_
prefix (this comes from
SPE's predecessor named Phonexia REST Server).
Based on type of data they contain, these can be divided to following groups:
- SPE internal data
- information about files and directories in SPE storage
- internal data: resource types, resource locks, users, user roles, user sessions, technology models
- user-created entities data
- SID speaker models and their voiceprints, speaker groups, calibration sets, audio source profiles
- LID language models
- KWS keyword lists
- cached processing results
- if caching is enabled, processing results for each technology
SPE internal data
Tables containing SPE internal data:
Table | Description |
---|---|
rest_directory_type | list of internal directory types |
rest_file_shadow | list of information about files registered in SPE – path, creation and modification timestamps, owner (SPE user), directory |
rest_log | SPE log data, see above |
rest_resource_type | list of internal resource types – file, SID speaker model, SID speaker group, SID calibration set, SID audio source profile, KWS keyword list, LID language pack |
rest_resource_lock | list of resources locked during processing |
rest_role | list of pre-defined SPE user roles |
rest_user | list of SPE users and their settings and status – login, password, active/inactive, max. pending operations, current pending operations |
rest_user_role | associations between users and roles |
rest_user_session | list of active user API sessions |
rest_technology_model | list of technology model names |
User-created SPE entity data
Tables containing data about entities created by SPE users:
Table | Description |
---|---|
rest_model_sid | list of SID speaker models – name, owner (SPE user), modification timestamp |
rest_model_sid_sources | list of files used as sources for SID speaker models creation |
rest_model_sid_metafiles | list of files used as SID speaker models metafiles |
rest_group_sid | list of SID speaker groups – name, owner (SPE user) |
rest_group_sid_models | associations between SID speaker groups and speaker models |
rest_voiceprint | SID voiceprints – voiceprint data, technology model used to create the voiceprint, speaker model to which the voiceprint belongs (speaker model voiceprints), calibration set to which the voiceprint belongs (FAR calibration set voiceprints) |
rest_model_sid_calib_voiceprint | SID speaker model voiceprints calibrated to FAR – voiceprint data, speaker model, technology model used to create the voiceprint, max. FAR, calibration set used to calibrate the voiceprint |
rest_calibset_sid | list of SID FAR calibration sets – name and modification timestamp, owner (SPE user) |
rest_calibset_sid_sources | list of files used as sources for SID FAR calibration sets creation |
rest_calibset_sid_metafiles | list of files used as SID FAR calibration sets metafiles |
rest_calibset_sid_total_chunks | number of chunks in SID FAR calibration sets |
rest_profile_sid4 | list of SID4 Audio Source Profiles – name, owner (SPE user), technology model used to create the profile, file with the profile content, hash |
rest_profile_sid4_metafiles | list of files used as SID4 Audio Source Profiles metafiles |
rest_model_lid | list of LID language packs – name, owner (SPE user), technology model to which the language pack belongs (i.e. technology model used to create source languageprints/language models) |
rest_model_lid_metafiles | list of LID language packs metafiles |
rest_model_kws | KWS keyword lists – keyword list JSON data, keyword list name, owner (SPE user), technology model to which the keyword list belongs |
Processing results data
Tables containing cached processing results (if results caching is enabled):
Table | Description |
---|---|
rest_result_age | AGE processing results – file, used technology model, results JSON data |
rest_result_diar | DIAR processing results – file, used technology model, used processing parameters, results JSON data |
rest_result_gid | GID processing results – file, used technology model, results JSON data |
rest_result_kws | KWS processing results – file, used technology model, used keyword list, results JSON data |
rest_result_lid | LID processing results – file, used technology model, used language pack, results JSON data |
rest_result_phnrec | PHNREC processing results – file, used technology model, results JSON data |
rest_result_sid | SID processing results – file, used technology model, used speaker model, used FAR calibration set, max. FAR, results JSON data |
rest_result_sid4 | SID4 processing results – file, used technology model, used speaker model, used file- and speaker model Audio Source Profile, results JSON data |
rest_result_sqe | SQE processing results – file, used technology model, results JSON data |
rest_result_stt | STT processing results – file, used technology model, results JSON data |
rest_result_tae | TAE processing results – file, used technology model, results JSON data |
rest_result_vad | VAD processing results – file, used technology model, results JSON data |
SPE logging to database
Storing SPE logs to the database is available only for MariaDB / MySQL.
This is mainly for performance reasons – SQLite is not designed for high
concurrency, i.e. its locking mechanism would create a bottleneck... especially
in setups where multiple SPE instances are configured to store the logging data
into the same database.
Log data is stored in rest_log
table and includes the following columns:
Columns | Description |
---|---|
Source | identifier of SPE subsystem that created the log record |
Name | identifier of source SPE that created the log record can be set by server.identifier or server.logging.database.identifier configuration settings (see SPE configuration file explained for details) |
ProcessId | numeric PID of the process that created the log record |
Thread | identifier of the thread that created the log record |
ThreadId | numeric ID of the thread that created the log record |
Priority | priority of the operation that created the log record |
Text | raw log text as it would be written into log file or console |
DateTime | log record creation timestamp |