Time Analysis of Speech
Technology Description
Time Analysis of Speech is designed to extract fundamental information about the flow of dialogue in stereo recordings, providing users with the conversation characteristics such as:
- Long Reaction Time Identification: Time Analysis can pinpoint the longest delay in individual speakers' reactions in the conversation. This feature is valuable for identifying areas for improvement in conversational efficiency.
- Crosstalk Detection: Time Analysis can detect crosstalks -- the moments when speakers in different channels talk simultaneously, potentially complicating communication. By analyzing crosstalks, users can better anticipate communication breakdowns and take proactive measures to mitigate them.
- Speech Rate Measurement: Time Analysis measures the rate of speech in terms of phonemes per second. Phonemes are the smallest units of sound that distinguish one word from another in a particular language. This measurement provides a basic insight into the pace of the conversation.
Input
The primary application of Time Analysis is in analyzing recordings of two-channel phone calls, where one channel captures the voice of the operator, and the other records the voice of the caller. In this setup, Time Analysis can extract useful information about the conversation dynamics.
Output
The JSON output of Time Analysis contains two main fields: channel_analyses
with information about individual channels, and reaction_analyses
with
information about combinations of channels. Let's have a closer look on each
field.
Channel Analysis
In the channel_analyses
section of the result you can find detailed statistics
for each channel showing several key characteristics of the speech activity of
the speakers:
- Speech Duration: Net speech duration refers to the portion of the audio that contains speech, excluding any pauses, hesitations, or non-verbal sounds.
- Speech Rate: The average speech rate (measured in phonemes per second) provides a basic insight into the pace of the conversation.
- Total Duration: The overall length of the recording gives the basic idea of the temporal scope of the content.
Reaction Analysis
In the reaction_analyses
section of the result you can find valuable insights
into the interactions between speakers in terms of turn-taking and crosstalks
within the conversation: :
- Reactions Count: Number of reactions of this channel to the other channel. A "reaction" is defined as the act when the speaker in the reacting channel starts speaking after the speaker in the other channel has stopped speaking.
- Average Reaction Time: Average time that elapsed between when the speaker in the other channel stopped speaking and the speaker in the reacting channel started speaking.
- Slowest Reaction Position: Position of this channel's slowest reaction (longest reaction time).
- Fastest Reaction Position: Position of this channel's fastest reaction (shortest reaction time).
- Crosstalks: List of positions of this channel's crosstalks. A "crosstalk" is defined as the act when the speaker in the reacting channel starts speaking while the speaker in the other channel is still speaking. The crosstalk lasts as long as both speakers are speaking.
More Information
For more information on how to use Time Analysis, you can read this detailed guide