Skip to main content

Voice Activity Detection

Voice Activity Detection is a language-, domain-, and channel-independent technology that identifies speech content versus non-speech content within audio recordings. It labels speech and non-speech parts of the recording, which can serve as a decision point for whether to process the recording using other technologies. VAD is commonly used in rapid filtration processes during deployment.

Typical use cases include:

  • Detecting the presence or absence of human speech for voice processing.
  • Filtering out non-speech parts of a recording.
  • Excluding recordings that lack sufficient net speech for further processing by other technologies.
  • Activating voice-driven processes, etc.

The processing speed of Voice Activity Detection is 140 ftRT per instance.

Prefiltering