Phoneme Recogniser
Phonexia Phoneme Recognizer (PHNREC) converts speech signals into pronunciation characters (phonemes). After conversion, the pronunciation (text) can be easily indexed and searched by third-party text data mining tools.
The technology is optimized for noisy recordings and colloquial speech. The Phoneme Recognizer is delivered as part of the Keyword Spotting (KWS) technology but can also be used independently of KWS.
Typical use cases
- "Search-in-speech" - searching for specific information in large call archives (e.g., claims inspection).
- Obtaining a custom pronunciation of a word or phrase to use as a customized keyword in keyword spotting technology (improving KWS technology accuracy).
- Obtaining a custom pronunciation of a word to add to the language model of speech-to-text technology.
Input
- Audio file; streaming is not supported.
- Technology model name (i.e., language code) to be used for phoneme transcription.
Output
Example
Input: "Hi, this is Lewis." (WAV file containing speech)
Output: sil hh ay dh ow s ih s l uw uw th sil
(plain-text or XML/JSON
format)
Note: The output may contain the following special tokens:
sil
: Silent part (or no speech detected).
The list of phonemes is available in the document phonemes_for_stt_and_kws.pdf
(provided in the manuals for SPE, STT, or KWS).
The list of supported languages for the Phoneme Recognizer is the same as for Keyword Spotting.