Performance Assessment of Emotion Recognition in Voice Data Using Convolutional Neural Networks (CNN)

  • Ahmad, Sabrina Megumi
  • Kawahigashi, Ken
  • Munir, Achmad
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Emotion recognition from voice data has evolved from a niche field into an important thing in human-computer interaction. It aims to facilitate more natural interactions with machines through voice data and enhance the understanding of emotional content. This paper deals with the performance assessment of emotion recognition in voice data using Mel-spectrogram and Mel Frequency Cepstral Coefficient (MFCC) features within a Convolutional Neural Networks (CNN) model. The assessment involves extracting the features from voice data, applying Gaussian pre-processing technique, and examining the model accuracy in recognizing six different emotions: joy, calm, anger, sorrow, surprise, and confusion. The results show that the CNN model using Mel-spectrograms achieved an accuracy of 91.73%, significantly outperforming the MFCC-based model, which achieved 73.69%. Furthermore, the use of Gaussian preprocessing can enhance the accuracy of Mel-spectrograms by 2%, while it improves MFCC results by nearly 10%. These findings indicate the potential of CNNs in emotion recognition and suggest further exploration of data preprocessing for more robust performance.

키워드

Convolutional Neural Networks (CNN)emotion recognitionMel Frequency Cepstrum Coefficient (MFCC)Mel-spectrogramvoice data
제목
Performance Assessment of Emotion Recognition in Voice Data Using Convolutional Neural Networks (CNN)
저자
Ahmad, Sabrina MegumiKawahigashi, KenMunir, Achmad
DOI
10.1109/SMARTNETS65254.2025.11106777
발행일
2025
유형
Proceedings Paper
저널명
2025 INTERNATIONAL CONFERENCE ON SMART APPLICATIONS, COMMUNICATIONS AND NETWORKING, SMARTNETS