Version: 2.1.0

Time analysis

We are going to perform time analysis. It means we extract base information from dialogue in a recording, providing essential knowledge about conversation flow. We already have an audio recording with the conversation between customer and operator. In this example we want to analyze this conversation and get information about average reaction length of operator speaking on first channel. This article describes how you can achieve that using our software.

Attached, you will find audio recording Jennifer_Pavla.wav which is a stereo recording of dialogue between two speakers. It will be used as example audio throughout this guide.

Please note, at the end of this guide, we provide a full Python code example that encapsulates all the steps discussed. This should offer a comprehensive understanding and an actionable guide on implementing speaker verification in your own projects.

Environment Setup

We are using Python 3.9 and Python library requests 2.27 in this example. You can install the requests library with pip as follows:

pip install requests~=2.27

Then, you can import the following libraries (time is built-in):

import time
import requests

Time analysis

In order to trigger time analysis for a single audio file, you should start by sending a POST request to /api/technology/time-analysis as follows:

with open(audio_path, mode="rb") as file:
    files = {"file": file}
    # Replace <speech-platform-server> with the actual server address
    response = requests.post(
        f"https://<speech-platform-server>/api/technology/time-analysis",
        files=files,
    )
    response.raise_for_status()

If the task was successfully accepted, 202 code will be returned together with a unique task ID in the response body. The task isn't immediately processed, but only scheduled for processing. You can check the current task status whilst polling for the result.

The URL for polling the result is returned in X-Location header. Alternatively, you can assemble the polling URL on your own by appending slash (/) and task ID to the initial URL.

polling_url = response.headers["x-location"]

counter = 0
while counter < 100:
    response = requests.get(polling_url)
    response.raise_for_status()
    data = response.json()
    task_status = data["task"]["state"]
    if task_status in ["done", "failed", "rejected"]:
        break
    counter += 1
    time.sleep(5)

Once the polling finishes, data will contain the latest response from the server - either response with the time analysis, or an error message with details in case processing was not able to finish. Example result of successful time analysis from a stereo-channel file:

{
    "task": {"task_id": "602a9699-bfdc-47fe-b829-d44bec317a4a", "state": "done"},
    "result": {
        "channel_analyses": [
            {
                "channel_number": 0,
                "speech_duration": 43.09,
                "speech_rate": 8.586679458618164,
                "total_duration": 119.29,
            },
            {
                "channel_number": 1,
                "speech_duration": 68.65,
                "speech_rate": 9.919883728027344,
                "total_duration": 120.314999999,
            },
        ],
        "reaction_analyses": [
            {
                "reacting_channel": 1,
                "reactions_count": 7,
                "average_reaction_time": 0.49499997,
                "slowest_reaction_position": {
                    "start_time": 96.025,
                    "end_time": 96.159999999,
                },
                "fastest_reaction_position": {
                    "start_time": 71.189999999,
                    "end_time": 72.215,
                },
                "crosstalks": [],
            },
            {
                "reacting_channel": 0,
                "reactions_count": 8,
                "average_reaction_time": 0.36125,
                "slowest_reaction_position": {
                    "start_time": 70.23,
                    "end_time": 70.269999999,
                },
                "fastest_reaction_position": {
                    "start_time": 14.005,
                    "end_time": 14.929999999,
                },
                "crosstalks": [],
            },
        ],
    },
}

Time analysis provides multiple statistics. For example, when we are interested in average_reaction_time, we can select this particular statistic like this:

average_reaction_time = data["result"]["reaction_analyses"][0]["average_reaction_time"]

Congratulation, you have succesfully run time analysis on example audio and extracted statistic of interest.

Full Python Code

Here is a full code for this example, slightly adjusted and wrapped into functions for better readability:

import time
import requests

SPEECH_PLATFORM_SERVER = "<speech-platform-server>"  # Replace with your actual server URL


def poll_result(polling_url: str, sleep: int = 5):
    while True:
        response = requests.get(polling_url)
        response.raise_for_status()
        data = response.json()
        task_status = data["task"]["state"]
        if task_status in ["done", "failed", "rejected"]:
            break
        time.sleep(sleep)
    return response


def do_time_analysis(audio_path: str):
    with open(audio_path, mode="rb") as file:
        files = {"file": file}
        response = requests.post(
            f"https://{SPEECH_PLATFORM_SERVER}/api/technology/time-analysis",
            files=files,
        )
        response.raise_for_status()
    polling_url = response.headers["x-location"]
    time_analysis_response = poll_result(polling_url)
    return time_analysis_response.json()


known_audio = "Jennifer_Pavla.wav"

data = do_time_analysis(known_audio)
average_reaction_time = data["result"]["reaction_analyses"][0]["average_reaction_time"]

print(data)
print("average_reaction_time on first channel is", average_reaction_time)

Environment Setup​

Time analysis​

Full Python Code​

Environment Setup

Time analysis

Full Python Code