Skip to main content

Phonexia FAQs

(Q) What operating systems can your application run on?

Our technologies are prepared to run on both Windows and Linux OS.

For more details of the supported operating systems as well as recommended HW setup, see Recommended OS and HW

(Q) What are the supported audio formats?

Formats supported directly and natively are:

  • WAVE (*.wav) container including any of:
    • unsigned 8-bit PCM (u8)
    • unsigned 16-bit PCM (u16le)
    • IEEE float 32-bit (f32le)
    • A-law (alaw)
    • µ-law (mulaw)
    • ADPCM
  • FLAC codec inside FLAC (*.flac) container
  • OPUS codec inside OGG (*.opus) container

Other audio formats must be converted to one of those natively supported using external tools.
SPE server can be configured do this conversion automatically in background, see Audio Converter article.

tip

Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple OS X.

Example of usage:

FFmpeg

ffmpeg -i <source\_audio\_file\_name> <output\_audio\_base\_name>.wav

This command converts any supported format/codec audio file to normalized WAV audio format in 16-bit PCM little-endian as it is the default system. For more parameters please check FFmpeg manual pages.

SoX

sox <source\_audio\_file\_name> -b 16 <output\_audio\_base\_name>.wav

Number of bits defined by -b parameter must be specified.

(Q) How to fix Error 1007: Unsupported audio format?

Phonexia Browser application may return error "1007: Unsupported audio format" during uploading audio file. Please consider if your audio files are in [insert page='supported-audio-formats' display='link' inline]. But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org Then continue as described below:

Using Phonexia Browser with embed SPE

Open the Browser configuration dialog by click on button "Settings" located in tool ribbon. Select tab "Speech Engine" and configure SPE as described in documentation. Don't forget select checkbox "Enable audio converter".

Using SPE as service/daemon

Open file settings\phxspe.properties using standard text editor. Then change the following line in "phxspe.properties" to enable background conversion:

audio_converter.enabled = false # change it to 'true'

Please check if the conversion tools configured below this line in phxspe.properties are configured properly. Here is an example of configuration for ffmpg:

\# Set converter command
# %1 is for input file
# %2 is for output file
ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2
# sox example:
# audio_converter.command = sox %1 %2

caution

By design and saving computing resources 'audio converter' is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser.

(Q) What languages do you offer?

It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages.

Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 20+ languages including English, French, German, Russian, Spanish and many more.

(Q) What languages are supported by LID?

Please see List of supported LID Languages. For more details, see LID technology documentation.

(Q) How to fix the Error 1013: Unsupported: Server does not support authentication with token?

Please check SPE subdirectory ./settings for configuration files.

  1. If only phxspe.browser.properties exists, then your Browser uses SPE as embedded component and set inside the file this directive:
    server.enable_authentication_token = false
    In that case you can still use SPE with Basic HTTP authentication, as described in documentation, section "Basic authentication"
  2. If you would like to play with "pure" daemon installation, then phxspe.properties file should exist in ./settings subdirectory. File phxspe.properties is created by phxadmin utility or can be created from ./data/phxspe.properties.default template file.
    1. Copy template file to ./settings directory
    2. Rename it to phxspe.properties
    3. Check for server.enable_authentication_token directive and setup it as needed.
    4. Restart phxspe
Basic installation steps are described in ./doc/INSTALL.html document.
(Q) What languages are supported by KWS?

Please see List of supported KWS Languages. documentation. Please see List of supported KWS Languages. documentation.

(Q) What languages are supported by STT?

(Q) I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…, unable to connect to the SPE server, unable to start the localhost…)

Windows:

  • Open terminal in folder where PhxBrowser.exe is located (hold Shift and click right mouse button in free space in windows explorer and select “open command window here”)

  • Run PhxBrowser software with command: PhxBrowser.exe /spe-debug /spe-output

  • PhxBrowser software will start with “SPE output” tab which shows the debug output of SPE

Linux:

  • Run PhxBrowser software in terminal with command: ./PhxBrowser --spe-debug --spe-output

  • PhxBrowser software will start with ” SPE output” tab which shows debug output of SPE

(Q) Why does the system show high score (>90%) even for non-targets?

Threshold for score isn't set up correctly. Adjust speaker score sharpness value to calibrate the recalculation.

Please see Calibration in technology documentation.

(Q) What do LLR, LR and score mean?

These abbreviations mean the following:

  • LR - likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other.  LR meets numbers in interval <0;+inf).
  • LLR - abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval (-inf;+inf).
  • Percentage (normalised) score - commonly used mathematical transformation of the LLR to percentage. This number is better for human readability but may bring some doubts if LLR numbers are too high (typically for some non-adapted installations). Interval <0;100> (or sometimes <0;1>), in %. The higher the score, the better the match.

(Q) I can’t manage to run Phonexia Browser software. I always get an error

I always get the same error messages:

  • unable to connect to the SPE
  • unable to start the localhost: giving up and kill the localhost.

This error may happen if the initialization of SPE engine takes too long. Phonexia Browser software treats it as initialization failure and kills the server.

You can fix this by doing the following:

  • Increase timeout in Settings > Speech Engine tab > First connection timeout
  • Use fewer instances of technologies, thus letting the Speech Engine to start faster
  • Use smaller models of technologies
(Q) We prefer USB dongle but without the USB storage

We don't provide USB without memory storage, possible solutions are:

  • establish security directives related to work with the USB dongle (persons allowed to, in/out memory scan check),
  • use HW based licensing,
  • use license server.

(Q) I am getting the error message "Your license is not for this application"

Check your license file (license.dat) by opening it in Notepad.

Make sure the license contains records for all required modules.

See Licensing article for additional information

(Q) What are the requirements for SID evaluation dataset?

For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset.

SID dataset (minimum requirements): To measure SID performance precisely, it's important to prepare evaluation recordings set very carefully.

The requirements are:

  • 50+ known speakers, 200+ recordings in total (i.e. 3 to 5 recordings per speaker*)
  • 1+ minute of net speech in each recording (i.e. usually 2+ minutes recording length)
  • only one speaker in each recording
  • wide variety of gender and age is recommended
  • recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution)
  • audio files should be mono, lin16 format, 8 kHz+ sample rate
tip

splitting single recording into multiple shorter recordings in order to meet the criteria of at least 3 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times.
In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results.

warning

Any human error in evaluation set preparation (in speaker uniqueness, placing recordings into wrong folder, etc.) affects the evaluation results, so it's very important to prepare the data carefully.

See SID Evaluation for more details.

(Q) Does the system come as an API?

Yes, the system comes as an API (for the production license).

(Q) How can I tell in which format the .wav file is?

From the utilities in the package***, you can find it in ffprobe <file_name>, it will write out the info about the file.

*Utility "ffprobe" is not included in our package(s). It is part of ffmpeg (https://ffmpeg.org/ffprobe.html) and is necessary to be installed separately.

(Q) Do the language-prints extracted from audio sources depend on the currently available language pack?

The language-prints do not depend on the current language pack used. You may use them for both training a new language pack and testing/comparing against an existing language pack.

The language-prints need to be compatible only with the model of LID used for language-print extraction.

(Q) What are the recommendations for LID adaptation set?

The following is recommended:

For adding new language to language pack:

20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record

For adapting the existing language model (discriminative training)

10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)

(Q) How to know what technologies are running on the server?

You can receive the list of running/configured technologies by running query get /technologies or using the phxadmin utility with parameter configure-tech

(Q) How to choose answer format from server (xml/json)?

  • Via HTTP header “Accept” parameter (application/json; application/xml)
  • Via request query “format=json/xml”

If the format is not defined (or the HTTP header “Accept” parameter has one of these values: application/,/,), server will return json.

(Q) My NET license has stopped working, returning "Not enough free licenses" error.

Please proceed by doing the following:

  • Check your Internet connectivity using a standard browser against https://www.phonexia.com.
  • Check you are not running more instances than allowed by the license file (using -j parameter in the command line).
  • In rare cases, your SW copy may have accidentally crashed. If this is the case, please wait for the automatic license renewal period (60 minutes after the last check).
  • Check if your connection to the license server hasn’t changed.
  • Check if the validity of the license is not expired.

(Q) My HW license has stopped working, returning "HW configuration has been changed" error. "

Check if you didn’t change your HW configuration or Operating System on the machine.

Please ask your Phonexia contact if the issue still occurs.

(Q) What to do with the ApplicationStartup: Unhandled exception: BsapiException error?

When running SPE, the following error occurs:

Unhandled exception: BsapiException:
SWaveformSegmenterI(/mnt/phxspe/home/phx/storage/dfs/a1cabcf7-c761-49f1
-a9bc-0a8209a09fd9.opus
Requested segment (78056, 102056) is out of waveform range (0,91840).

It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox utility as preprocessor of the audio and do audio normalization by self-conversion from opus to opus before recordings are processed through SPE.

(Q)How do I get results for a pending operation?

If server responds on pending request by status 200 – OK, the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again).

If server responds on pending request by status 202 – Accepted, server will create task and server will begin to process the file. In response HTTP header (in parameter “Location”) there is path for pending resource. In the body there is a ID of pending operation.

  • Polling: Client asks on the pending resource (e.g. “get /pending/{ID}). Server will answer with status 200 and in the body there is a status of operation: “running”. Client will repeat this request periodically with free time gaps until server will respond with 303 – See Other status (in a body there will be status “finished”). In HTTP header of this operation (in parameter “Location”) is a resource path for the operation. It is possible to use operation ID in body of the response. Client will ask for resource “get /done/{ID}”, where the final result will be.

  • WebSocket: Client asks for websocket creation by get /pending/{ID}. In query header there will be the parameters Upgrade, Connection, Sec-WebSocket-Version, Sec-WebSocket-Key. Authentication has to be a part of the header (HTTP basic or Session – according the server setup). Result of asynchronous operation will be send by the server by websocket client – client doesn’t have to ask the server repeatedly. Example of HTTP header:

GET /pending/ec563083-3d9b-457d-a0ac-24b197bc222f HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Version: 13
X-SessionID: 258f505c-a6fa-4c3f-8a87-b048874ac6aa

(Q) What types of integration do you offer?

Phonexia Speech Engine with its technologies is distributed as REST API interface.

For evaluation and testing purposes, graphical user interface (GUI) called Phonexia Browser is provided.

Upon request, technologies can be provided also in form of command lined (CLI)

Rest API documentation https://download.phonexia.com/docs/spe/

(Q) Which authentication options are allowed by the server and how does it work?

The following options are supported:

- HTTP basic authorization – Client asks for session by resource “post /login” with HTTP basic authorization in query header. If server responds with error 405, server doesn’t support authorization by sessions and it is necessary to use basic authorization.

- Authorization by session – Authorization by session is done by adding parameter “X-SessionID“ into HTTP header to each query. Basic Authorization is done by HTTP standard in header of each query for the server. You can set this in ./settings/phxspe.properties

(Q) While trying to install SPE3, I get the error for loading libasound.so.2 libraries

Currently I’m trying to install the provided binaries for Linux, but I do get the following when running phxadmin: ./phxadmin: error while loading shared libraries: libasound.so.2: cannot open shared object file: No such file or directory I’m trying to run this under CentOS 7.

A: Please install the right libraries required for manipulation with audio files from official repository into your OS.

For example for CentOS you may use:

sudo yum install alsa-utils alsa-lib

tip

Great utility for finding subsequent Redhat/Fedora/CentOS libraries is https://www.rpmfind.net/linux/RPM/index.html