상세 보기
MERC-KDC: Multimodal Emotion Recognition in Converation via Knowledge Distillation and Contrasive Learning
초록
Emotion Recognition in Conversation (ERC) plays a significant role in enhancing the responsiveness of dialogue systems by inferring users' emotions through various modalities, including audio, visual, and text. However, previous ERC studies have primarily focused on learning contextual and speakerdependent features from the text modality, while non-verbal modalities have not been fully utilized. Additionally, many multimodal ERC approaches assign equal importance to all modalities and fuse them accordingly, which limits the effective use of the text modality that carries strong emotional cues. To address these limitations, we propose a method that applies Knowledge Distillation (KD), where the text modality is used as the teacher model and the non-verbal modalities serve as student models. Additionally, we incorporate Emotion Label-Guided Contrastive Learning (ELCL) to enhance representation learning of emotions. In addition, instead of using the commonly used Supervised Contrastive Loss (SCL) in contrastive learning, we design an Focal Contrastive Loss (FCL) that accounts for minority classes and hard positive samples with low similarity, thereby improving both training stability and representational balance. Experimental results demonstrate that our proposed model, MERC-KDC, achieves higher overall emotion recognition accuracy on the IEMOCAP dataset compared to existing methods. In particular, the model shows significant improvement in recognizing minority class emotions.
- 제목
- MERC-KDC: Multimodal Emotion Recognition in Converation via Knowledge Distillation and Contrasive Learning
- 저자
- KIM DEOKHWAN
- 학회명
- 16th Intl Conference on ubiquitout and Future Networks (ICUFN) 2025
- 개최지
- Iscte-University Institute of Lisbon, Portugal
- 학회 개최일
- 2025-07-08 ~ 2025-07-11