Speech Engine Metafiles
Certain SPE entities, such as Speaker Identification (SID) Speaker models, Speaker Identification (SID) Audio source profiles, and Language Identification (LID) Language packs, can have additional information associated with them in the form of "metafiles." This article outlines the intended use of these metafiles.
SPE is primarily designed as an underlying engine focused solely on speech-related audio processing. Any supplementary functionality should be implemented at the application layer, meaning it should be managed by the application built on top of the SPE API. This includes handling metadata related to the processed audio files, such as phone numbers, recording sources, the date and time of recordings, references to individuals speaking (including names and photos), and the languages spoken. All such data should be stored in a database managed by the application.
However, for simpler applications where adding a database might be an unnecessary complication, managing metadata directly within SPE can be a convenient alternative.
The /metafile
endpoint allows for direct management of metadata in SPE. By
using POST, GET, or DELETE methods, users can upload, download, or delete any
file containing metadata, which is then associated with the corresponding SPE
entity.
There are no restrictions on the content or naming of metafiles, apart from those imposed by the underlying operating system and filesystem. You can store any type of data that might benefit your application, including plain text files, structured formats like JSON or XML, pictures, documents, and multimedia files.
Metafiles are physically stored in the SPE user's "home" directory, specifically
within the data
subdirectory. For more details, refer to the article
Speech engine home directory article. The
maximum size for a single metafile can be configured using the
server.max_metadata_size
setting in the
SPE configuration file.
Example
An example of metafile usage is shown in the image below, where Phonexia Browser utilizes SPE metafiles to store metadata for SID speaker models. Textual properties, such as name and date of birth, are stored in a JSON file (note that the structure and interpretation of this file are defined by the Browser, not by SPE). Additional files, such as a speaker's photo or any other attachments, are stored as separate files.
Another example involves the information about the content of a created LID
language pack. When a LID language pack is successfully created, SPE generates a
metafile named report
, which provides detailed information about the source
files used during the language pack's creation. For more details about the
content of the report's metafile, refer to the
LID language pack creation REST endpoint documentation.