Skip to main content
Version: 3.4.0

Time Analysis of Speech

Technology description

Time Analysis of Speech is designed to extract fundamental information about the flow of dialogue in stereo recordings, providing users with conversation characteristics such as:

  • Long Reaction Time Identification: Time Analysis can pinpoint the longest delay in individual speakers' reactions in the conversation. This feature is valuable for identifying areas for improvement in conversational efficiency.
  • Crosstalk Detection: Time Analysis can detect crosstalks, meaning the moments when speakers in different channels talk simultaneously, potentially complicating communication. By analyzing crosstalks, users can better anticipate communication breakdowns and take proactive measures to mitigate them.
  • Speech Rate Measurement: Time Analysis measures the rate of speech in terms of phonemes per second. Phonemes are the smallest units of sound that distinguish one word from another in a particular language. This measurement provides fundamental insight into the pace of the conversation.

Input

The primary application of Time Analysis is in analyzing recordings of two-channel phone calls, where one channel captures the voice of the operator, and the other records the voice of the caller. In this setup, Time Analysis can extract useful information about the conversation dynamics.

Output

The JSON output of Time Analysis contains two main fields: channel_analyses with information about individual channels, and reaction_analyses with information about combinations of channels. Let's have a closer look at each field.

Channel analysis

In the channel_analyses section of the result, you can find detailed statistics for each channel showing several key characteristics of the speech activity of the speakers:

  1. Speech Duration: Net speech duration refers to the portion of the audio that contains speech, excluding any pauses, hesitations, or non-verbal sounds.
  2. Speech Rate: The average speech rate (measured in phonemes per second) provides a basic insight into the pace of the conversation.
  3. Total Duration: The overall length of the recording gives a basic idea of the temporal scope of the content.

Reaction analysis

In the reaction_analyses section of the result, you can find valuable insights into the interactions between speakers in terms of turn-taking and crosstalks within the conversation: :

  1. Reactions Count: The number of reactions from this channel to the other channel. A "reaction" occurs when a this channel starts speaking immediately after a speaker in the other channel has stopped.
  2. Average Reaction Time: The average duration between when the speaker in the other channel stops speaking and when the speaker in the reacting channel begins speaking.
  3. Slowest Reaction Position: Position of this channel's slowest reaction (longest reaction time).
  4. Fastest Reaction Position: Position of this channel's fastest reaction (shortest reaction time).
  5. Crosstalks: List of positions of this channel's crosstalks.

More information

For more information on how to use Time Analysis, you can read this detailed guide.