MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning

Kim, Deog Hwa; Park, Ji-kyu; Kim, Deokhwan

doi:10.1109/ICUFN65838.2025.11169837

상세 보기

MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning

Kim, Deog Hwa;
Park, Ji-kyu;
Kim, Deokhwan

Citations

WEB OF SCIENCE

1

Citations

SCOPUS

1

초록

Emotion Recognition in Conversation (ERC) plays a significant role in enhancing the responsiveness of dialogue systems by inferring users' emotions through various modalities, including audio, visual, and text. However, previous ERC studies have primarily focused on learning contextual and speakerdependent features from the text modality, while non-verbal modalities have not been fully utilized. Additionally, many multimodal ERC approaches assign equal importance to all modalities and fuse them accordingly, which limits the effective use of the text modality that carries strong emotional cues. To address these limitations, we propose a method that applies Knowledge Distillation (KD), where the text modality is used as the teacher model and the non-verbal modalities serve as student models. Additionally, we incorporate Emotion Label-Guided Contrastive Learning (ELCL) to enhance representation learning of emotions. In addition, instead of using the commonly used Supervised Contrastive Loss (SCL) in contrastive learning, we design an Focal Contrastive Loss (FCL) that accounts for minority classes and hard positive samples with low similarity, thereby improving both training stability and representational balance. Experimental results demonstrate that our proposed model, MERC-KDC, achieves higher overall emotion recognition accuracy on the IEMOCAP dataset compared to existing methods. In particular, the model shows significant improvement in recognizing minority class emotions. © 2025 IEEE.

키워드

Emotion Label-Guided Contrastive Learning; Focal Contrastive Loss; Knowledge Distillation; Multimodal Emotion Recognition in Conversation

제목: MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning

저자: Kim, Deog Hwa; Park, Ji-kyu; Kim, Deokhwan

DOI: 10.1109/ICUFN65838.2025.11169837

발행일: 2025

유형: Proceedings Paper

저널명: International Conference on Ubiquitous and Future Networks, ICUFN

페이지: 480 ~ 485