Version: 2026.03.0-rc1

Emotion Recognition

Phonexia Emotion Recognition is an advanced technology designed to accurately determine what emotion is being expressed in a recording, that is, whether the speaker is happy, neutral, sad, or angry, regardless of the language spoken.

This page explains how to use Phonexia Emotion Recognition in our web application. If you want to dive deeper into the inner workings of this technology, check out our detailed technical documentation.

Uploading files

Upload your files or create your own recordings by using the built-in recording feature. If you don't have your own files, you can use the provided Phonexia examples to explore how the technology works. Learn more about uploading files here.

Files can also be sent to and received from other Phonexia technologies. Learn more about sending files here.

Results

After uploading, your recordings will appear in the left panel.

caution

Leaving the page for an extended period while awaiting the results may interrupt the process. If this happens, you will need to restart the audio processing.

Once processing is complete, the right panel will display the results for each recording as a radial bar chart, showing the ocurrence (in form of percentage) of each of the four emotions: happy, neutral, sad, angry. If the audio is in stereo, separate results will be shown for each channel.

Export formats

Once your results are ready, you can export them in various formats.

Emotion Recognition results can be exported individually for each file in CSV, XLSX, or JSON format. Each export file is named after the corresponding file and includes the channel number, along with the probability for each emotion.

The same results can also be exported in bulk as a ZIP file. Additionally, users have the option to export a summary file that displays the probabilities for all the selected recordings.

XLSX

Table showing channel information and the respective probabilities for each emotion.

CSV

Channel,Happy (%),Neutral (%),Sad (%),Angry (%)
0,79.87,19.02,0.56,0.55

JSON

JSON-format results for audio include channel information, speech length, and probabilities for each emotion.

{
  "channels": [
    {
      "channel_number": 0,
      "speech_length": 43.59,
      "scores": [
        {
          "emotion": "happy",
          "probability": 0.7987
        },
        {
          "emotion": "neutral",
          "probability": 0.19022
        },
        {
          "emotion": "sad",
          "probability": 0.00562
        },
        {
          "emotion": "angry",
          "probability": 0.00545
        }
      ]
    }
  ]
}

All results

Whether exported as a CSV or XLSX file, the data includes the percentage occurrence of each emotion across all recordings.

Table showing filename, channel, and the respective probabilities for each emotion.

Uploading files​

Results​

Export formats​

XLSX​

CSV​

JSON​

All results​