EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

1

초록

This study introduces EMGVox-GAN, a groundbreaking synthesis approach that combines electromyography (EMG) signals with advanced deep learning techniques to produce speech, departing from conventional vocoder technology. The EMGVox-GAN was crafted with a Scale- Adaptive-Frequency-Enhanced Discriminator (SAFE-Disc) composed of three individual sub- discriminators specializing in processing signals of varying frequency scales. Each sub- discriminator includes two downblocks, strengthening the discriminators in discriminating between real and fake audio (generated audio). The proposed EMGVox-GAN was validated on a speech dataset (LJSpeech) and three EMG datasets (Silent Speech, CSL-EMG-Array, and XVoice_Speech_EMG). We have significantly enhanced speech quality, achieving a Mean Opinion Score (MOS) of 4.14 on our largest dataset. Additionally, the Word Error Rate (WER) was notably reduced from 47 % to 36 %, as defined in the state-of-the-art work, underscoring the improved clarity in the synthesized speech. This breakthrough offers a transformative shift in speech synthesis by utilizing silent EMG signals to generate intelligible, high-quality speech. Beyond the advancement in speech quality, the EMGVox-GAN's successful integration of EMG signals opens new possibilities for applications in assistive technology, human-computer interaction, and other domains where clear and efficient speech synthesis is crucial.

키워드

EMGVox-GANElectromyographySpeech synthesisSilent speechMulti-dataset strategy
제목
EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization
저자
Sualiheen, SaraKim, Deok-Hwan
DOI
10.1016/j.csl.2024.101754
발행일
2025-06
유형
Article
저널명
Computer Speech and Language
92