MERC-KDC: Multimodal Emotion Recognition in Converation via Knowledge Distillation and Contrasive Learning

KIM DEOKHWAN

상세 보기

MERC-KDC: Multimodal Emotion Recognition in Converation via Knowledge Distillation and Contrasive Learning

KIM DEOKHWAN

초록

Emotion Recognition in Conversation (ERC) plays a significant role in enhancing the responsiveness of dialogue systems by inferring users' emotions through various modalities, including audio, visual, and text. However, previous ERC studies have primarily focused on learning contextual and speakerdependent features from the text modality, while non-verbal modalities have not been fully utilized. Additionally, many multimodal ERC approaches assign equal importance to all modalities and fuse them accordingly, which limits the effective use of the text modality that carries strong emotional cues. To address these limitations, we propose a method that applies Knowledge Distillation (KD), where the text modality is used as the teacher model and the non-verbal modalities serve as student models. Additionally, we incorporate Emotion Label-Guided Contrastive Learning (ELCL) to enhance representation learning of emotions. In addition, instead of using the commonly used Supervised Contrastive Loss (SCL) in contrastive learning, we design an Focal Contrastive Loss (FCL) that accounts for minority classes and hard positive samples with low similarity, thereby improving both training stability and representational balance. Experimental results demonstrate that our proposed model, MERC-KDC, achieves higher overall emotion recognition accuracy on the IEMOCAP dataset compared to existing methods. In particular, the model shows significant improvement in recognizing minority class emotions.

제목: MERC-KDC: Multimodal Emotion Recognition in Converation via Knowledge Distillation and Contrasive Learning

저자: KIM DEOKHWAN

학회명: 16th Intl Conference on ubiquitout and Future Networks (ICUFN) 2025

개최지: Iscte-University Institute of Lisbon, Portugal

학회 개최일: 2025-07-08 ~ 2025-07-11