All You Need Is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation

Kim, Seongho; Song, Byung Cheol

doi:10.1007/978-3-031-73039-9_20

상세 보기

All You Need Is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation

Kim, Seongho;
Song, Byung Cheol

Citations

WEB OF SCIENCE

2

Citations

SCOPUS

4

초록

With the rise of generative models, multi-modal video generation has gained significant attention, particularly in the realm of audio-driven emotional talking face synthesis. This paper addresses two key challenges in this domain: Input bias and intensity saturation. A novel neutralization scheme is first proposed to counter input bias, yielding impressive results in generating neutral talking faces from emotionally expressive ones. Furthermore, 2D continuous emotion label-based regression learning effectively generates varying emotional intensities on a frame basis. Results from a user study quantify subjective interpretations of strong emotions and naturalness, revealing up to 78.09% higher emotion accuracy and up to 3.41 higher naturalness score compared to the lowest-ranked method https://github.com/sbde500/EAP.

키워드

Emotional talking face generation; neutralization; emotion from audio; emotional intensity

제목: All You Need Is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation

저자: Kim, Seongho; Song, Byung Cheol

DOI: 10.1007/978-3-031-73039-9_20

발행일: 2025

유형: Proceedings Paper

저널명: Lecture Notes in Computer Science

권: 15122

페이지: 347 ~ 363