Words-to-Numbers Feature
This article explains the details of the Speech to Text feature for native numeric numbers and dates transcription in n-best output and provides tips for fine-tuning the results.
The feature works out-of-the-box in the following STT languages and models:
- English – EN_US_6 and EN_US_A_6
- Spanish – ES_6
- Polish – PL_PL_6
- Czech – CS_CZ_5 and CS_CZ_6
- Slovak – SK_SK_5 and SK_SK_6
You can add this functionality to other languages or fine-tune the existing ones by modifying the conversion rules. See below for more details.
What is the words-to-numbers feature
The words-to-numbers feature allows conversion of raw transcriptions of numbers, dates (or similar patterns like credit card numbers) to their native form:
two thousand twenty one | 2021 |
fifteen hundred eighty six point zero three | 1586.03 |
sixty four million seven hundred thousand ninety | 64700090 |
This feature helps simplify the processing of transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, such as in voicebot applications.
Where is the converted output available?
The words-to-numbers conversion is available only in n-best output (i.e., where the entire sentence variants are provided), for both file and stream transcription.
The reason it is not available in word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription. As new words are processed, they may change the previous output:
two | 2 |
two thousand | 2000 |
two thousand twenty | 2020 |
two thousand twenty one | 2021 |
This would require retroactively changing text that was already outputted, which is not feasible. Alternatively, the output would need to be delayed, which is undesirable in real-time stream processing.
Thus, the best compromise is to leave word-level outputs unchanged and perform conversions only at the segment or sentence level.
How does it work?
The words-to-numbers conversion relies on a set of grammar rules that dictate how the conversion should function.
Conversion rules are stored in the numeric.pegjs
file, located in the grm
subdirectory inside the STT model directory. For example:
- In Czech 6th generation STT, it is located at
{SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm
. - In Spanish 6th generation STT, it is located at
{SPE_directory}/bsapi/stt/data/models_es_6/grm
.
Can it be extended or tuned?
You can edit the numeric.pegjs
file to fine-tune or extend the conversion
functionality.
Create a backup copy of numeric.pegjs
before editing the file! Incorrect
changes can have unpredictable effects and may cause the STT to stop working.
The rules are described using PEG.js syntax, a JavaScript-like modification of Parsing Expression Grammar (PEG). Details about the syntax can be found on the PEG.js website at https://pegjs.org/documentation#grammar-syntax-and-semantics.
Here are short examples of the syntax, showing excerpts of standard and ordinal digit conversions from 1 to 9:
ENGLISH
...
DIGITS
= 'one' { return 1 }
/ 'two' { return 2 }
/ 'three' { return 3 }
/ 'four' & { return boundary() } { return 4 }
/ 'five' & { return boundary() } { return 5 }
/ 'six' & { return boundary() } { return 6 }
/ 'seven' & { return boundary() } { return 7 }
/ 'eight' & { return boundary() } { return 8 }
/ 'nine' & { return boundary() } { return 9 }
ZERO = 'zero' { return 0 }
.
.
.
DIGITS_ORDINAL_ST = 'first' { return 1 }
DIGITS_ORDINAL_ND = 'second' { return 2 }
DIGITS_ORDINAL_RD = 'third' { return 3 }
DIGITS_ORDINAL_TH
= 'fourth' { return 4 }
/ 'fifth' { return 5 }
/ 'sixth' { return 6 }
/ 'seventh' { return 7 }
/ 'eighth' { return 8 }
/ 'ninth' { return 9 }
ZERO_ORDINAL = 'zeroth' { return 0 }
...
CZECH
...
DIGITS
= ('jedna' / 'jeden') { return 1 }
/ ('dva' / 'dvě' / 'dvou') { return 2 }
/ ('tři' / 'tří') { return 3 }
/ ('č' / 'š') 'ty' ('ry' / 'ři' / 'ř') { return 4 }
/ 'pět' 'i'? { return 5 }
/ 'šest' 'i'? { return 6 }
/ 'sed' ('mi' / 'm' / 'um') { return 7 }
/ 'os' ('mi' / 'm' / 'um') { return 8 }
/ ('devíti' / 'devět') { return 9 }
DIGITS_ORDINAL
= 'první' ('ho' / 'mu')? { return 1 }
/ 'druh' DIGITS_ORDINAL_SUFFIX { return 2 }
/ 'třetí' ('ho' / 'mu')? { return 3 }
/ ('č' / 'š') 'tvrt' DIGITS_ORDINAL_SUFFIX { return 4 }
/ 'pát' DIGITS_ORDINAL_SUFFIX { return 5 }
/ 'šest' DIGITS_ORDINAL_SUFFIX { return 6 }
/ 'sedm' DIGITS_ORDINAL_SUFFIX { return 7 }
/ 'osm' DIGITS_ORDINAL_SUFFIX { return 8 }
/ 'devát' DIGITS_ORDINAL_SUFFIX { return 9 }
DIGITS_ORDINAL_SUFFIX = 'ého' / 'ýho' / 'ému' / 'ýmu' / 'ou' / 'ý' / 'á' / 'é'
...