MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning

Citations

WEB OF SCIENCE

1
Citations

SCOPUS

1

초록

Emotion Recognition in Conversation (ERC) plays a significant role in enhancing the responsiveness of dialogue systems by inferring users' emotions through various modalities, including audio, visual, and text. However, previous ERC studies have primarily focused on learning contextual and speakerdependent features from the text modality, while non-verbal modalities have not been fully utilized. Additionally, many multimodal ERC approaches assign equal importance to all modalities and fuse them accordingly, which limits the effective use of the text modality that carries strong emotional cues. To address these limitations, we propose a method that applies Knowledge Distillation (KD), where the text modality is used as the teacher model and the non-verbal modalities serve as student models. Additionally, we incorporate Emotion Label-Guided Contrastive Learning (ELCL) to enhance representation learning of emotions. In addition, instead of using the commonly used Supervised Contrastive Loss (SCL) in contrastive learning, we design an Focal Contrastive Loss (FCL) that accounts for minority classes and hard positive samples with low similarity, thereby improving both training stability and representational balance. Experimental results demonstrate that our proposed model, MERC-KDC, achieves higher overall emotion recognition accuracy on the IEMOCAP dataset compared to existing methods. In particular, the model shows significant improvement in recognizing minority class emotions. © 2025 IEEE.

키워드

Emotion Label-Guided Contrastive LearningFocal Contrastive LossKnowledge DistillationMultimodal Emotion Recognition in Conversation
제목
MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning
저자
Kim, Deog HwaPark, Ji-kyuKim, Deokhwan
DOI
10.1109/ICUFN65838.2025.11169837
발행일
2025
유형
Proceedings Paper
저널명
International Conference on Ubiquitous and Future Networks, ICUFN
페이지
480 ~ 485