Visual Scene-aware Hybrid Neural Network Architecture for Video-based Facial Expression Recognition

Citations

WEB OF SCIENCE

11
Citations

SCOPUS

34

초록

With rapid development of deep learning, facial expression recognition ( FER) technology has made considerable progress recently. However, since conventional FER techniques are mainly designed and learned for videos which are artificially acquired in a limited environment, they may not operate robustly on videos acquired in a wild environment. To solve this problem, this paper proposes a scene-aware hybrid neural network ( NN) having a novel combination of three-dimensional ( 3D) convolutional NN ( CNN), 2D CNN and recurrent NN ( RNN). The characteristics of the proposed network are as follows. First, we extract video-based global features and frame-based local features at the same time. In detail, the latent features containing the overall visual scene of a given video are extracted by 3D CNN with auxiliary classifier, and fine-tuned 2D CNN is adopted to extract latent features containing small details from each frame. Second, RNN not only performs temporal domain learning, but also feature-wise fuses two latent features extracted from the networks. For effective fusion, we also present three RNN schemes. Third, the proposed network, in which the above-mentioned methods collaborate, works very robust in a wild environment as well as in a limited environment. Extensive experiments show that the proposed network provides an average accuracy of 49.9% for AFEW dataset, i. e., a representative wild dataset, and an amazing accuracy of 98.2% for another CK+ dataset. We also show that the proposed network outperforms the state-of-the-art network(s).

제목
Visual Scene-aware Hybrid Neural Network Architecture for Video-based Facial Expression Recognition
저자
Lee, Min KyuChoi, Dong YoonKim, Dae HaSong, Byung Cheol
DOI
10.1109/fg.2019.8756551
발행일
2019
유형
Proceedings Paper
저널명
2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019)
페이지
153 ~ 160