Words-to-Numbers Feature

This article explains the details of the Speech to Text feature for native numeric numbers and dates transcription in n-best output and provides tips for fine-tuning the results.

note

The feature works out-of-the-box in the following STT languages and models:

English – EN_US_6 and EN_US_A_6
Spanish – ES_6
Polish – PL_PL_6
Czech – CS_CZ_5 and CS_CZ_6
Slovak – SK_SK_5 and SK_SK_6

You can add this functionality to other languages or fine-tune the existing ones by modifying the conversion rules. See below for more details.

What is the words-to-numbers feature

The words-to-numbers feature allows conversion of raw transcriptions of numbers, dates (or similar patterns like credit card numbers) to their native form:


two thousand twenty one	2021
fifteen hundred eighty six point zero three	1586.03
sixty four million seven hundred thousand ninety	64700090

This feature helps simplify the processing of transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, such as in voicebot applications.

Where is the converted output available?

The words-to-numbers conversion is available only in n-best output (i.e., where the entire sentence variants are provided), for both file and stream transcription.

The reason it is not available in word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription. As new words are processed, they may change the previous output:


two	2
two thousand	2000
two thousand twenty	2020
two thousand twenty one	2021

This would require retroactively changing text that was already outputted, which is not feasible. Alternatively, the output would need to be delayed, which is undesirable in real-time stream processing.

Thus, the best compromise is to leave word-level outputs unchanged and perform conversions only at the segment or sentence level.

How does it work?

The words-to-numbers conversion relies on a set of grammar rules that dictate how the conversion should function.

Conversion rules are stored in the numeric.pegjs file, located in the grm subdirectory inside the STT model directory. For example:

In Czech 6th generation STT, it is located at {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm.
In Spanish 6th generation STT, it is located at {SPE_directory}/bsapi/stt/data/models_es_6/grm.

Can it be extended or tuned?

You can edit the numeric.pegjs file to fine-tune or extend the conversion functionality.

warning

Create a backup copy of numeric.pegjs before editing the file! Incorrect changes can have unpredictable effects and may cause the STT to stop working.

The rules are described using PEG.js syntax, a JavaScript-like modification of Parsing Expression Grammar (PEG). Details about the syntax can be found on the PEG.js website at https://pegjs.org/documentation#grammar-syntax-and-semantics.

Here are short examples of the syntax, showing excerpts of standard and ordinal digit conversions from 1 to 9:

ENGLISH

...
DIGITS
  = 'one' { return 1 }
  / 'two' { return 2 }
  / 'three' { return 3 }
  / 'four' & { return boundary() } { return 4 }
  / 'five' & { return boundary() } { return 5 }
  / 'six' & { return boundary() } { return 6 }
  / 'seven' & { return boundary() } { return 7 }
  / 'eight' & { return boundary() } { return 8 }
  / 'nine' & { return boundary() } { return 9 }
ZERO = 'zero' { return 0 }
.
.
.
DIGITS_ORDINAL_ST = 'first' { return 1 }
DIGITS_ORDINAL_ND = 'second' { return 2 }
DIGITS_ORDINAL_RD = 'third' { return 3 }
DIGITS_ORDINAL_TH
  = 'fourth' { return 4 }
  / 'fifth' { return 5 }
  / 'sixth' { return 6 }
  / 'seventh'  { return 7 }
  / 'eighth'  { return 8 }
  / 'ninth'  { return 9 }
ZERO_ORDINAL = 'zeroth' { return 0 }
...

CZECH

...
DIGITS
  = ('jedna' / 'jeden') { return 1 }
  / ('dva' / 'dvě' / 'dvou') { return 2 }
  / ('tři' / 'tří') { return 3 }
  / ('č' / 'š') 'ty' ('ry' / 'ři' / 'ř') { return 4 }
  / 'pět' 'i'? { return 5 }
  / 'šest' 'i'? { return 6 }
  / 'sed' ('mi' / 'm' / 'um') { return 7 }
  / 'os' ('mi' / 'm' / 'um') { return 8 }
  / ('devíti' / 'devět') { return 9 }
DIGITS_ORDINAL
  = 'první' ('ho' / 'mu')? { return 1 }
  / 'druh' DIGITS_ORDINAL_SUFFIX { return 2 }
  / 'třetí' ('ho' / 'mu')? { return 3 }
  / ('č' / 'š') 'tvrt' DIGITS_ORDINAL_SUFFIX { return 4 }
  / 'pát' DIGITS_ORDINAL_SUFFIX { return 5 }
  / 'šest' DIGITS_ORDINAL_SUFFIX { return 6 }
  / 'sedm' DIGITS_ORDINAL_SUFFIX { return 7 }
  / 'osm' DIGITS_ORDINAL_SUFFIX { return 8 }
  / 'devát' DIGITS_ORDINAL_SUFFIX { return 9 }
DIGITS_ORDINAL_SUFFIX = 'ého' / 'ýho' / 'ému' / 'ýmu' / 'ou' / 'ý' / 'á' / 'é'
...

What is the words-to-numbers feature​

Where is the converted output available?​

How does it work?​

Can it be extended or tuned?​

What is the words-to-numbers feature

Where is the converted output available?

How does it work?

Can it be extended or tuned?