Skip to main content

Technologies Configuration File

This article explains the purpose and structure of the SPE technologies configuration file, technologies.xml, or technologies.json created by Phonexia Browser.

An SPE installation usually includes multiple speech technologies (e.g., Speaker Identification, Speech To Text, etc.) in various technological models (e.g., L4, XL4, etc.) or supporting various languages (e.g., 6th generation of EN_US, CS_CZ, etc.). You can select which technologies/models to enable in your SPE installation. Typically, you may want to test various models during initial testing to see how they perform on your audio, or you may want to enable additional technologies during the development of your application.

To select which technologies/models to enable in your SPE, you can use one of the SPE administration tools, phxadmin or phxadmin2. The resulting configuration is then stored in the technologies.xml configuration file, located in the SPE settings directory. SPE reads this configuration file during startup and initializes the technology instances according to the information in the file.

The file has a very simple structure and can also be created or modified using any plaintext editor, or programmatically.

Example

The example below shows a technologies.xml file containing the following setup:

  • STT (Speech To Text) with
    • 8 instances of SK_SK_5 model
  • STT_STREAM (Speech To Text for stream processing) with
    • 2  instances of CS_CZ_6 model
  • SID4E (Speaker Identification 4 Voiceprint Extractor) with
    • 2 instances of L4 model
    • 3 instances of XL4 model
  • SID4C (Speaker Identification 4 Voiceprint Comparator) with
    • 2 instances of L4 model
    • 3 instances of XL4 model

<?xml version="1.0"?>

<technology_subsystem_settings>

<technologies>
<item>
<name>STT</name>
<models>
<item>
<name>SK\_SK\_5</name>
<n\_instances>8</n\_instances>
<config\_file />
</item>
</models>
</item>
<item>
<name>STT\_STREAM</name>
<models>
<item>
<name>CS\_CZ\_6</name>
<n\_instances>2</n\_instances>
<config\_file />
</item>
</models>
</item>
<item>
<name>SID4E</name>
<models>
<item>
<name>L4</name>
<n\_instances>2</n\_instances>
<config\_file />
</item>
<item>
<name>XL4</name>
<n\_instances>3</n\_instances>
<config\_file />
</item>
</models>
</item>
<item>
<name>SID4C</name>
<models>
<item>
<name>L4</name>
<n\_instances>2</n\_instances>
<config\_file />
</item>
<item>
<name>XL4</name>
<n\_instances>3</n\_instances>
<config\_file />
</item>
</models>
</item>
</technologies>
</technology\_subsystem\_settings>

The meaning of individual elements should be self-explanatory.

The only element that might require more information is the config_file element. This element is generally kept empty but allows specifying the name of a *.bs BSAPI configuration file to be used by the technology initializer instead of the default file associated with the technology and model.
However, this feature should only be used in special cases, such as when suggested by Phonexia experts. SPE users should normally avoid modifying BSAPI configuration files. If any technology configuration customization is needed, the user configuration file is the appropriate method.

Technology names supported in the technologies configuration file:

AGE                 Age Estimation
DENOISER Denoiser
DIAR Diarization
GID Gender Identification
KWS Keyword Spotting
KWS\_STREAM Keyword Spotting Stream
LIDC Language Identification Languageprint Comparator
LIDE Language Identification Languageprint Extractor
PHNREC Phoneme Recognition
SID4C Speaker Identification 4 Voiceprint Comparator
SID4C\_STREAM Speaker Identification 4 Voiceprint Stream Comparator
SID4CALIB Speaker Identification 4 VoicePrint Calibration
SID4E Speaker Identification 4 Voiceprint Extractor
SID4E\_STREAM Speaker Identification 4 Voiceprint Stream Extractor
SQE Speech Quality Estimation
SQE\_STREAM Speech Quality Estimation Stream
STT Speech To Text
STT\_STREAM Speech To Text Stream
TAE Time Analysis Extraction
TAE\_STREAM Time Analysis Extraction Stream
VAD Voice Activity Detection
VAD\_STREAM Voice Activity Detection Stream
SIDC Speaker Identification Voiceprint Comparator (legacy)
SIDC\_STREAM Speaker Identification Voiceprint Stream Comparator (legacy)
SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy)
SIDCALIBSET\_STREAM Speaker Identification VoicePrint Stream Calibration (legacy)
SIDE Speaker Identification Voiceprint Extractor  (legacy)
SIDE\_STREAM Speaker Identification Voiceprint Stream Extractor (legacy)
DICTATE Dictate (valid only in SPE 3.17 and older)


JSON-formatted file created by Phonexia Browser

If SPE technologies are configured from Phonexia Browser, which is possible only if SPE is used in the special "embedded SPE" (or "SPE on localhost") mode from Phonexia Browser, the technologies configuration is stored in the JSON-formatted technologies.json file in the SPE settings directory. This is to separate the Browser-made configuration for this special SPE mode from the normal SPE technologies configuration. Therefore, the configurations inside the XML and JSON files can differ.

Example

The example below shows a technologies.json file containing (almost) the same setup as in the XML file example above. There are two differences compared to the XML example:

  • The STT_STREAM technology is missing—Phonexia Browser does not support stream processing, i.e., it does not allow configuration of stream technologies.
  • The config_file setting is also missing—Phonexia Browser does not support this special expert-level feature, i.e., it does not store the setting.

{
"technology\_subsystem\_settings": {
"technologies": \[
{
"name": "STT",
"models": \[
{
"name": "SK\_SK\_5",
"n\_instances": 8
}
\]
},
{
"name": "SID4E",
"models": \[
{
"name": "L4",
"n\_instances": 2
},
{
"name": "XL4",
"n\_instances": 3
}
\]
},
{
"name": "SID4C",
"models": \[
{
"name": "L4",
"n\_instances": 2
},
{
"name": "XL4",
"n\_instances": 3
}
\]
}
\]
}
}