상세 보기
초록
Emotion recognition from voice data has evolved from a niche field into an important thing in human-computer interaction. It aims to facilitate more natural interactions with machines through voice data and enhance the understanding of emotional content. This paper deals with the performance assessment of emotion recognition in voice data using Mel-spectrogram and Mel Frequency Cepstral Coefficient (MFCC) features within a Convolutional Neural Networks (CNN) model. The assessment involves extracting the features from voice data, applying Gaussian pre-processing technique, and examining the model accuracy in recognizing six different emotions: joy, calm, anger, sorrow, surprise, and confusion. The results show that the CNN model using Mel-spectrograms achieved an accuracy of 91.73%, significantly outperforming the MFCC-based model, which achieved 73.69%. Furthermore, the use of Gaussian preprocessing can enhance the accuracy of Mel-spectrograms by 2%, while it improves MFCC results by nearly 10%. These findings indicate the potential of CNNs in emotion recognition and suggest further exploration of data preprocessing for more robust performance.
키워드
- 제목
- Performance Assessment of Emotion Recognition in Voice Data Using Convolutional Neural Networks (CNN)
- 저자
- Ahmad, Sabrina Megumi; Kawahigashi, Ken; Munir, Achmad
- 발행일
- 2025
- 유형
- Proceedings Paper
- 저널명
- 2025 INTERNATIONAL CONFERENCE ON SMART APPLICATIONS, COMMUNICATIONS AND NETWORKING, SMARTNETS