Skip to main content

Speech Engine Metafiles

Certain SPE entities, such as Speaker Identification (SID) Speaker models, Speaker Identification (SID) Audio source profiles, and Language Identification (LID) Language packs, can have additional information associated with them in the form of "metafiles." This article outlines the intended use of these metafiles.

SPE is primarily designed as an underlying engine focused solely on speech-related audio processing. Any supplementary functionality should be implemented at the application layer, meaning it should be managed by the application built on top of the SPE API. This includes handling metadata related to the processed audio files, such as phone numbers, recording sources, the date and time of recordings, references to individuals speaking (including names and photos), and the languages spoken. All such data should be stored in a database managed by the application.

However, for simpler applications where adding a database might be an unnecessary complication, managing metadata directly within SPE can be a convenient alternative.

spe_rest-endpoints-for-metafile

The /metafile endpoint allows for direct management of metadata in SPE. By using POST, GET, or DELETE methods, users can upload, download, or delete any file containing metadata, which is then associated with the corresponding SPE entity.

tip

There are no restrictions on the content or naming of metafiles, apart from those imposed by the underlying operating system and filesystem. You can store any type of data that might benefit your application, including plain text files, structured formats like JSON or XML, pictures, documents, and multimedia files.

Metafiles are physically stored in the SPE user's "home" directory, specifically within the data subdirectory. For more details, refer to the article Speech engine home directory article. The maximum size for a single metafile can be configured using the server.max_metadata_size setting in the SPE configuration file.

Example

An example of metafile usage is shown in the image below, where Phonexia Browser utilizes SPE metafiles to store metadata for SID speaker models. Textual properties, such as name and date of birth, are stored in a JSON file (note that the structure and interpretation of this file are defined by the Browser, not by SPE). Additional files, such as a speaker's photo or any other attachments, are stored as separate files.

spe_metafiles_in_browser

Another example involves the information about the content of a created LID language pack. When a LID language pack is successfully created, SPE generates a metafile named report, which provides detailed information about the source files used during the language pack's creation. For more details about the content of the report's metafile, refer to the LID language pack creation REST endpoint documentation.