상세 보기
EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization
- Sualiheen, Sara;
- Kim, Deok-Hwan
WEB OF SCIENCE
0SCOPUS
1초록
This study introduces EMGVox-GAN, a groundbreaking synthesis approach that combines electromyography (EMG) signals with advanced deep learning techniques to produce speech, departing from conventional vocoder technology. The EMGVox-GAN was crafted with a Scale- Adaptive-Frequency-Enhanced Discriminator (SAFE-Disc) composed of three individual sub- discriminators specializing in processing signals of varying frequency scales. Each sub- discriminator includes two downblocks, strengthening the discriminators in discriminating between real and fake audio (generated audio). The proposed EMGVox-GAN was validated on a speech dataset (LJSpeech) and three EMG datasets (Silent Speech, CSL-EMG-Array, and XVoice_Speech_EMG). We have significantly enhanced speech quality, achieving a Mean Opinion Score (MOS) of 4.14 on our largest dataset. Additionally, the Word Error Rate (WER) was notably reduced from 47 % to 36 %, as defined in the state-of-the-art work, underscoring the improved clarity in the synthesized speech. This breakthrough offers a transformative shift in speech synthesis by utilizing silent EMG signals to generate intelligible, high-quality speech. Beyond the advancement in speech quality, the EMGVox-GAN's successful integration of EMG signals opens new possibilities for applications in assistive technology, human-computer interaction, and other domains where clear and efficient speech synthesis is crucial.
키워드
- 제목
- EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization
- 저자
- Sualiheen, Sara; Kim, Deok-Hwan
- 발행일
- 2025-06
- 유형
- Article
- 권
- 92