Emotion Recognition
This guide demonstrates how to perform Emotion Recognition with Phonexia Speech Platform 4. You can find a high-level description in the About Emotion Recognition article. The technology can detect and classify emotion from media files.
For testing, we'll be using the following 8 recordings. You can download them all together in the audio_files.zip archive.
Filename | Channel | Happy | Neutral | Sad | Angry |
---|---|---|---|---|---|
Barbara.wav | 0 | 79.9% | 19% | 0.6% | 0.5% |
David.wav | 0 | 0.1% | 0.2% | 0% | 99.7% |
1 | 9.9% | 63.9% | 0.1% | 26% | |
Jack.wav | 0 | 4.7% | 92.7% | 0% | 2.6% |
Jack_Keith.wav | 0 | 5.7% | 85.6% | 0% | 8.7% |
1 | 0.3% | 5.6% | 9.8% | 84.3% | |
Jiri.wav | 0 | 0.4% | 1.4% | 0% | 98.1% |
Juan.wav | 0 | 13.3% | 47.6% | 38.6% | 0.5% |
Laura_Marek.wav | 0 | 0.3% | 95.1% | 4.3% | 0.4% |
1 | 0.1% | 99.9% | 0% | 0.1% | |
Steve.wav | 0 | 0.2% | 76.2% | 23.4% | 0.1% |
At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to integrate Emotion Recognition in your own projects.
Prerequisites
In the guide, we assume that the Virtual Appliance is running on port 8000
of
http://localhost
and contains a proper model and license for the technology.
For more information on how to install and start the Virtual Appliance, please
refer to the Virtual Appliance Installation chapter.
Environment Setup
We are using Python 3.9
and Python library requests 2.27
in this example.
You can install the requests
library with pip
as follows:
pip install requests~=2.27
Basic Emotion Recognition from file
To run Emotion Recognition for a single audio file, you should start by sending
a POST
request to the
/api/technology/emotion-recognition
endpoint. In Python, you can do this as follows:
import requests
SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/emotion-recognition"
audio_path = "Barbara.wav"
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
print(f"{response.status_code=}") # Should print 'response.status_code=202'
If the task has been successfully accepted, the 202
code will be returned
together with a unique task ID
in the response
body. The task isn't
processed immediately, but only scheduled for processing. You can check the
current task status by polling for the result.
The URL for polling the result is returned in the Location
header.
Alternatively, you can assemble the polling URL on your own by appending a slash
(/
) and the task ID
to the endpoint URL.
import time
polling_url = response.headers["Location"] # Use the `response` from the previous step
# Alternatively:
# polling_url = ENDPOINT_URL + "/" + response.json()["task"]["task_id"]
while True:
response = requests.get(polling_url)
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(5)
print(f"{data=}")
Once polling is complete, the data
object will contain the server’s latest
response. This will either include the results of Emotion Recognition or an
error message if the processing could not be completed successfully.
The result
section provides details for each processed audio channel,
identified by its channel_number
. The speech_length
field indicates the
duration of speech analyzed for emotion detection.
The scores
array contains probability values for different emotions,
representing the system's confidence in each detected emotion:
{
"task": {
"task_id": "123e4567-e89b-12d3-a456-426614174000",
"state": "done"
},
"result": {
"channels": [
{
"channel_number": 0,
"speech_length": 13.5,
"scores": [
{ "emotion": "HAPPY", "probability": 0.85 },
{ "emotion": "NEUTRAL", "probability": 0.1 },
{ "emotion": "SAD", "probability": 0.05 },
{ "emotion": "ANGRY", "probability": 0.0 }
]
},
{
"channel_number": 1,
"speech_length": 13.5,
"scores": [
{ "emotion": "HAPPY", "probability": 0.75 },
{ "emotion": "NEUTRAL", "probability": 0.15 },
{ "emotion": "SAD", "probability": 0.1 },
{ "emotion": "ANGRY", "probability": 0.0 }
]
}
]
}
}
Full Python Code
Here is the full example on how to run the Emotion Recognition technology with media files. The code is slightly adjusted and wrapped into functions.
import json
import requests
import time
SPEECH_PLATFORM_SERVER = "http://localhost:8000" # Replace with your actual server URL
ENDPOINT_URL = f"{SPEECH_PLATFORM_SERVER}/api/technology/emotion-recognition"
def poll_result(polling_url: str, sleep: int = 5):
while True:
response = requests.get(polling_url)
response.raise_for_status()
data = response.json()
task_status = data["task"]["state"]
if task_status in {"done", "failed", "rejected"}:
break
time.sleep(sleep)
return response
def run_emotion_recognition(audio_path: str):
print(f"Running Emotion Recognition for file {audio_path}.")
with open(audio_path, mode="rb") as file:
files = {"file": file}
response = requests.post(
url=ENDPOINT_URL,
files=files,
)
response.raise_for_status()
polling_url = response.headers["Location"]
emotion_recognition_response = poll_result(polling_url)
return emotion_recognition_response.json()
filenames = [
"Barbara.wav",
"David.wav",
"Jack.wav",
"Jack_Keith.wav",
"Jiri.wav",
"Juan.wav",
"Laura_Marek.wav",
"Steve.wav",
]
for filename in filenames:
data = run_emotion_recognition(filename)
result = data["result"]
print(json.dumps(result, indent=2))