상세 보기
MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning
- Kim, Deog Hwa;
- Park, Ji-kyu;
- Kim, Deokhwan
WEB OF SCIENCE
1SCOPUS
1초록
Emotion Recognition in Conversation (ERC) plays a significant role in enhancing the responsiveness of dialogue systems by inferring users' emotions through various modalities, including audio, visual, and text. However, previous ERC studies have primarily focused on learning contextual and speakerdependent features from the text modality, while non-verbal modalities have not been fully utilized. Additionally, many multimodal ERC approaches assign equal importance to all modalities and fuse them accordingly, which limits the effective use of the text modality that carries strong emotional cues. To address these limitations, we propose a method that applies Knowledge Distillation (KD), where the text modality is used as the teacher model and the non-verbal modalities serve as student models. Additionally, we incorporate Emotion Label-Guided Contrastive Learning (ELCL) to enhance representation learning of emotions. In addition, instead of using the commonly used Supervised Contrastive Loss (SCL) in contrastive learning, we design an Focal Contrastive Loss (FCL) that accounts for minority classes and hard positive samples with low similarity, thereby improving both training stability and representational balance. Experimental results demonstrate that our proposed model, MERC-KDC, achieves higher overall emotion recognition accuracy on the IEMOCAP dataset compared to existing methods. In particular, the model shows significant improvement in recognizing minority class emotions. © 2025 IEEE.
키워드
- 제목
- MERC-KDC: Multimodal Emotion Recognition in Conversation Via Knowledge Distillation and Contrastive Learning
- 저자
- Kim, Deog Hwa; Park, Ji-kyu; Kim, Deokhwan
- 발행일
- 2025
- 유형
- Proceedings Paper
- 저널명
- International Conference on Ubiquitous and Future Networks, ICUFN
- 페이지
- 480 ~ 485