Technologies Configuration File

This article explains the purpose and structure of the SPE technologies configuration file, technologies.xml, or technologies.json created by Phonexia Browser.

An SPE installation usually includes multiple speech technologies (e.g., Speaker Identification, Speech To Text, etc.) in various technological models (e.g., L4, XL4, etc.) or supporting various languages (e.g., 6th generation of EN_US, CS_CZ, etc.). You can select which technologies/models to enable in your SPE installation. Typically, you may want to test various models during initial testing to see how they perform on your audio, or you may want to enable additional technologies during the development of your application.

To select which technologies/models to enable in your SPE, you can use one of the SPE administration tools, phxadmin or phxadmin2. The resulting configuration is then stored in the technologies.xml configuration file, located in the SPE settings directory. SPE reads this configuration file during startup and initializes the technology instances according to the information in the file.

The file has a very simple structure and can also be created or modified using any plaintext editor, or programmatically.

Example

The example below shows a technologies.xml file containing the following setup:

STT (Speech To Text) with
- 8 instances of SK_SK_5 model
STT_STREAM (Speech To Text for stream processing) with
- 2 instances of CS_CZ_6 model
SID4E (Speaker Identification 4 Voiceprint Extractor) with
- 2 instances of L4 model
- 3 instances of XL4 model
SID4C (Speaker Identification 4 Voiceprint Comparator) with
- 2 instances of L4 model
- 3 instances of XL4 model

<?xml version="1.0"?>

<technology_subsystem_settings>

  <technologies>
    <item>
      <name>STT</name>
      <models>
        <item>
          <name>SK\_SK\_5</name>
          <n\_instances>8</n\_instances>
          <config\_file />
        </item>
      </models>
    </item>
    <item>
      <name>STT\_STREAM</name>
      <models>
        <item>
          <name>CS\_CZ\_6</name>
          <n\_instances>2</n\_instances>
          <config\_file />
        </item>
      </models>
    </item>
    <item>
      <name>SID4E</name>
      <models>
        <item>
          <name>L4</name>
          <n\_instances>2</n\_instances>
          <config\_file />
        </item>
        <item>
          <name>XL4</name>
          <n\_instances>3</n\_instances>
          <config\_file />
        </item>
      </models>
    </item>
    <item>
      <name>SID4C</name>
      <models>
        <item>
          <name>L4</name>
          <n\_instances>2</n\_instances>
          <config\_file />
        </item>
        <item>
          <name>XL4</name>
          <n\_instances>3</n\_instances>
          <config\_file />
        </item>
      </models>
    </item>
  </technologies>
</technology\_subsystem\_settings>

The meaning of individual elements should be self-explanatory.

The only element that might require more information is the config_file element. This element is generally kept empty but allows specifying the name of a *.bs BSAPI configuration file to be used by the technology initializer instead of the default file associated with the technology and model.
However, this feature should only be used in special cases, such as when suggested by Phonexia experts. SPE users should normally avoid modifying BSAPI configuration files. If any technology configuration customization is needed, the user configuration file is the appropriate method.

Technology names supported in the technologies configuration file:

AGE                 Age Estimation
DENOISER            Denoiser
DIAR                Diarization
GID                 Gender Identification
KWS                 Keyword Spotting
KWS\_STREAM         Keyword Spotting Stream
LIDC                Language Identification Languageprint Comparator
LIDE                Language Identification Languageprint Extractor
PHNREC              Phoneme Recognition
SID4C               Speaker Identification 4 Voiceprint Comparator
SID4C\_STREAM       Speaker Identification 4 Voiceprint Stream Comparator
SID4CALIB           Speaker Identification 4 VoicePrint Calibration
SID4E               Speaker Identification 4 Voiceprint Extractor
SID4E\_STREAM       Speaker Identification 4 Voiceprint Stream Extractor
SQE                 Speech Quality Estimation
SQE\_STREAM         Speech Quality Estimation Stream
STT                 Speech To Text
STT\_STREAM         Speech To Text Stream
TAE                 Time Analysis Extraction
TAE\_STREAM         Time Analysis Extraction Stream
VAD                 Voice Activity Detection
VAD\_STREAM         Voice Activity Detection Stream
SIDC                Speaker Identification Voiceprint Comparator          (legacy)
SIDC\_STREAM        Speaker Identification Voiceprint Stream Comparator   (legacy)
SIDCALIBSET         Speaker Identification VoicePrint Calibration         (legacy)
SIDCALIBSET\_STREAM Speaker Identification VoicePrint Stream Calibration  (legacy)
SIDE                Speaker Identification Voiceprint Extractor           (legacy)
SIDE\_STREAM        Speaker Identification Voiceprint Stream Extractor    (legacy)
DICTATE             Dictate  (valid only in SPE 3.17 and older)

JSON-formatted file created by Phonexia Browser

If SPE technologies are configured from Phonexia Browser, which is possible only if SPE is used in the special "embedded SPE" (or "SPE on localhost") mode from Phonexia Browser, the technologies configuration is stored in the JSON-formatted technologies.json file in the SPE settings directory. This is to separate the Browser-made configuration for this special SPE mode from the normal SPE technologies configuration. Therefore, the configurations inside the XML and JSON files can differ.

Example

The example below shows a technologies.json file containing (almost) the same setup as in the XML file example above. There are two differences compared to the XML example:

The STT_STREAM technology is missing—Phonexia Browser does not support stream processing, i.e., it does not allow configuration of stream technologies.
The config_file setting is also missing—Phonexia Browser does not support this special expert-level feature, i.e., it does not store the setting.

{
  "technology\_subsystem\_settings": {
    "technologies": \[
      {
        "name": "STT",
        "models": \[
          {
            "name": "SK\_SK\_5",
            "n\_instances": 8
          }
        \]
      },
      {
        "name": "SID4E",
        "models": \[
          {
            "name": "L4",
            "n\_instances": 2
          },
          {
            "name": "XL4",
            "n\_instances": 3
          }
        \]
      },
      {
        "name": "SID4C",
        "models": \[
          {
            "name": "L4",
            "n\_instances": 2
          },
          {
            "name": "XL4",
            "n\_instances": 3
          }
        \]
      }
    \]
  }
}

Example​

JSON-formatted file created by Phonexia Browser​

Example​

Example

JSON-formatted file created by Phonexia Browser

Example