Time analysis
We are going to perform time analysis. It means we extract base information from dialogue in a recording, providing essential knowledge about conversation flow. We already have an audio recording with the conversation between customer and operator. In this example we want to analyze this conversation and get information about average reaction length of operator speaking on first channel. This article describes how you can achieve that using our software.
Attached, you will find audio recording Jennifer_Pavla.wav which is a stereo recording of dialogue between two speakers. It will be used as example audio throughout this guide.
Please note, at the end of this guide, we provide a full Python code example that encapsulates all the steps discussed. This should offer a comprehensive understanding and an actionable guide on implementing speaker verification in your own projects.
Environment Setup
We are using Python 3.9
and Python library requests 2.27
in this example.
You can install the requests
library with pip
as follows:
pip install requests~=2.27
Then, you can import the following libraries (time
is built-in):
import time
import requests
Time analysis
In order to trigger time analysis for a single audio file, you should start by
sending a POST
request to
/api/technology/time-analysis
as follows:
with open(audio_path, mode="rb") as file:
files = {"file": file}
# Replace <speech-platform-server> with the actual server address
response = requests.post(
f"https://<speech-platform-server>/api/technology/time-analysis",
files=files,
)
response.raise_for_status()
If the task was successfully accepted, 202 code will be returned together with a
unique task ID
in the response body. The task isn't immediately processed, but
only scheduled for processing. You can check the current task status whilst
polling for the result.
The URL for polling the result is returned in X-Location
header.
Alternatively, you can assemble the polling URL on your own by appending slash
(/
) and task ID
to the initial URL.
polling_url = response.headers["x-location"]
counter = 0
while counter < 100:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in ["done", "failed", "rejected"]:
break
counter += 1
time.sleep(5)
Once the polling finishes, data
will contain the latest response from the
server - either response with the time analysis, or an error message with
details in case processing was not able to finish. Example result of successful
time analysis from a stereo-channel file:
{
"task": {"task_id": "602a9699-bfdc-47fe-b829-d44bec317a4a", "state": "done"},
"result": {
"channel_analyses": [
{
"channel_number": 0,
"speech_duration": 43.09,
"speech_rate": 8.586679458618164,
"total_duration": 119.29,
},
{
"channel_number": 1,
"speech_duration": 68.65,
"speech_rate": 9.919883728027344,
"total_duration": 120.314999999,
},
],
"reaction_analyses": [
{
"reacting_channel": 1,
"reactions_count": 7,
"average_reaction_time": 0.49499997,
"slowest_reaction_position": {
"start_time": 96.025,
"end_time": 96.159999999,
},
"fastest_reaction_position": {
"start_time": 71.189999999,
"end_time": 72.215,
},
"crosstalks": [],
},
{
"reacting_channel": 0,
"reactions_count": 8,
"average_reaction_time": 0.36125,
"slowest_reaction_position": {
"start_time": 70.23,
"end_time": 70.269999999,
},
"fastest_reaction_position": {
"start_time": 14.005,
"end_time": 14.929999999,
},
"crosstalks": [],
},
],
},
}
Time analysis provides multiple statistics. For example, when we are interested in average_reaction_time, we can select this particular statistic like this:
average_reaction_time = data["result"]["reaction_analyses"][0]["average_reaction_time"]
Congratulation, you have succesfully run time analysis on example audio and extracted statistic of interest.
Full Python Code
Here is a full code for this example, slightly adjusted and wrapped into functions for better readability:
import time
import requests
SPEECH_PLATFORM_SERVER = "<speech-platform-server>" # Replace with your actual server URL
def poll_result(polling_url: str, sleep: int = 5):
while True:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in ["done", "failed", "rejected"]:
break
time.sleep(sleep)
return response
def do_time_analysis(audio_path: str):
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
f"https://{SPEECH_PLATFORM_SERVER}/api/technology/time-analysis",
files=files,
)
response.raise_for_status()
polling_url = response.headers["x-location"]
time_analysis_response = poll_result(polling_url)
return time_analysis_response.json()
known_audio = "Jennifer_Pavla.wav"
data = do_time_analysis(known_audio)
average_reaction_time = data["result"]["reaction_analyses"][0]["average_reaction_time"]
print(data)
print("average_reaction_time on first channel is", average_reaction_time)