Dual-Distillation Vision-Language Model for Multimodal Emotion Recognition in Conversation with Quantized Edge Deployment

  • Kim, DeogHwa
  • Lee, Yu Il
  • Yoon, Da Hyun
  • Kim, Byeong Jun
  • Kim, Deok-Hwan
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Multimodal Emotion Recognition in Conversation (ERC) has attracted attention as a key technology in human-computer interaction, mental healthcare, and intelligent services. However, deploying ERC in real-world settings remains challenging due to reliability gaps across modalities, instability in visual representations, and the high computational cost of large pretrained models. In particular, on resource-constrained edge devices, it is difficult to reduce model size and inference latency while preserving accuracy. To address these challenges, we jointly propose a knowledge-distillation-based multimodal ERC model, called DDVLM, with an edge-optimized Weight-Only Quantization (WOQ) pipeline for efficient edge deployment. DDVLM assigns the textual modality as the teacher and the visual modality as the student, transferring emotion-distribution knowledge to improve non-verbal representations and stabilize multimodal learning. In addition, Exponential Moving Average (EMA)-based self-distillation enhances the consistency and generalization capability of text features. Meanwhile, the proposed WOQ pipeline quantizes linear-layer weights to INT8 while preserving precision-sensitive operations in mixed precision, thereby minimizing accuracy loss and reducing model size, memory usage, and inference latency. Experiments on the MELD dataset demonstrated that the proposed approach achieves state-of-the-art performance while also enabling real-time inference on edge devices such as NVIDIA Jetson. Overall, this work presents a practical ERC framework that jointly considers accuracy and deployability.

키워드

multimodal emotion recognition in conversationknowledge distillationexponential moving averagevision-language modelweight-only quantization
제목
Dual-Distillation Vision-Language Model for Multimodal Emotion Recognition in Conversation with Quantized Edge Deployment
저자
Kim, DeogHwaLee, Yu IlYoon, Da HyunKim, Byeong JunKim, Deok-Hwan
DOI
10.3390/app16063103
발행일
2026-03-23
유형
Article
저널명
APPLIED SCIENCES-BASEL
16
6