Technologies Configuration File
This article explains the purpose and structure of the SPE technologies
configuration file, technologies.xml
, or technologies.json
created by
Phonexia Browser.
An SPE installation usually includes multiple speech technologies (e.g., Speaker Identification, Speech To Text, etc.) in various technological models (e.g., L4, XL4, etc.) or supporting various languages (e.g., 6th generation of EN_US, CS_CZ, etc.). You can select which technologies/models to enable in your SPE installation. Typically, you may want to test various models during initial testing to see how they perform on your audio, or you may want to enable additional technologies during the development of your application.
To select which technologies/models to enable in your SPE, you can use one of
the SPE administration tools,
phxadmin
or phxadmin2
. The resulting configuration is then stored in the
technologies.xml
configuration file, located in the SPE settings
directory.
SPE reads this configuration file during startup and initializes the technology
instances according to the information in the file.
The file has a very simple structure and can also be created or modified using any plaintext editor, or programmatically.
Example
The example below shows a technologies.xml
file containing the following
setup:
STT
(Speech To Text) with8
instances ofSK_SK_5
model
STT_STREAM
(Speech To Text for stream processing) with2
instances ofCS_CZ_6
model
SID4E
(Speaker Identification 4 Voiceprint Extractor) with2
instances ofL4
model3
instances ofXL4
model
SID4C
(Speaker Identification 4 Voiceprint Comparator) with2
instances ofL4
model3
instances ofXL4
model
<?xml version="1.0"?>
<technology_subsystem_settings>
<technologies>
<item>
<name>STT</name>
<models>
<item>
<name>SK\_SK\_5</name>
<n\_instances>8</n\_instances>
<config\_file />
</item>
</models>
</item>
<item>
<name>STT\_STREAM</name>
<models>
<item>
<name>CS\_CZ\_6</name>
<n\_instances>2</n\_instances>
<config\_file />
</item>
</models>
</item>
<item>
<name>SID4E</name>
<models>
<item>
<name>L4</name>
<n\_instances>2</n\_instances>
<config\_file />
</item>
<item>
<name>XL4</name>
<n\_instances>3</n\_instances>
<config\_file />
</item>
</models>
</item>
<item>
<name>SID4C</name>
<models>
<item>
<name>L4</name>
<n\_instances>2</n\_instances>
<config\_file />
</item>
<item>
<name>XL4</name>
<n\_instances>3</n\_instances>
<config\_file />
</item>
</models>
</item>
</technologies>
</technology\_subsystem\_settings>
The meaning of individual elements should be self-explanatory.
The only element that might require more information is the config_file
element. This element is generally kept empty but allows specifying the name of
a *.bs
BSAPI configuration file to be used by the technology initializer
instead of the default file associated with the technology and model.
However, this feature should only be used in special cases, such as when
suggested by Phonexia experts. SPE users should normally avoid modifying BSAPI
configuration files. If any technology configuration customization is needed,
the
user configuration file is
the appropriate method.
Technology names supported in the technologies configuration file:
AGE Age Estimation
DENOISER Denoiser
DIAR Diarization
GID Gender Identification
KWS Keyword Spotting
KWS\_STREAM Keyword Spotting Stream
LIDC Language Identification Languageprint Comparator
LIDE Language Identification Languageprint Extractor
PHNREC Phoneme Recognition
SID4C Speaker Identification 4 Voiceprint Comparator
SID4C\_STREAM Speaker Identification 4 Voiceprint Stream Comparator
SID4CALIB Speaker Identification 4 VoicePrint Calibration
SID4E Speaker Identification 4 Voiceprint Extractor
SID4E\_STREAM Speaker Identification 4 Voiceprint Stream Extractor
SQE Speech Quality Estimation
SQE\_STREAM Speech Quality Estimation Stream
STT Speech To Text
STT\_STREAM Speech To Text Stream
TAE Time Analysis Extraction
TAE\_STREAM Time Analysis Extraction Stream
VAD Voice Activity Detection
VAD\_STREAM Voice Activity Detection Stream
SIDC Speaker Identification Voiceprint Comparator (legacy)
SIDC\_STREAM Speaker Identification Voiceprint Stream Comparator (legacy)
SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy)
SIDCALIBSET\_STREAM Speaker Identification VoicePrint Stream Calibration (legacy)
SIDE Speaker Identification Voiceprint Extractor (legacy)
SIDE\_STREAM Speaker Identification Voiceprint Stream Extractor (legacy)
DICTATE Dictate (valid only in SPE 3.17 and older)
JSON-formatted file created by Phonexia Browser
If SPE technologies are configured from Phonexia Browser, which is possible only
if SPE is used in the special "embedded SPE" (or "SPE on localhost") mode from
Phonexia Browser, the technologies configuration is stored in the JSON-formatted
technologies.json
file in the SPE settings
directory. This is to separate
the Browser-made configuration for this special SPE mode from the normal SPE
technologies configuration. Therefore, the configurations inside the XML and
JSON files can differ.
Example
The example below shows a technologies.json
file containing (almost) the same
setup as in the XML file example above. There are two differences compared to
the XML example:
- The
STT_STREAM
technology is missing—Phonexia Browser does not support stream processing, i.e., it does not allow configuration of stream technologies. - The
config_file
setting is also missing—Phonexia Browser does not support this special expert-level feature, i.e., it does not store the setting.
{
"technology\_subsystem\_settings": {
"technologies": \[
{
"name": "STT",
"models": \[
{
"name": "SK\_SK\_5",
"n\_instances": 8
}
\]
},
{
"name": "SID4E",
"models": \[
{
"name": "L4",
"n\_instances": 2
},
{
"name": "XL4",
"n\_instances": 3
}
\]
},
{
"name": "SID4C",
"models": \[
{
"name": "L4",
"n\_instances": 2
},
{
"name": "XL4",
"n\_instances": 3
}
\]
}
\]
}
}