EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization

Sualiheen, Sara; Kim, Deok-Hwan

doi:10.1016/j.csl.2024.101754

상세 보기

EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization

Sualiheen, Sara;
Kim, Deok-Hwan

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

1

초록

This study introduces EMGVox-GAN, a groundbreaking synthesis approach that combines electromyography (EMG) signals with advanced deep learning techniques to produce speech, departing from conventional vocoder technology. The EMGVox-GAN was crafted with a Scale- Adaptive-Frequency-Enhanced Discriminator (SAFE-Disc) composed of three individual sub- discriminators specializing in processing signals of varying frequency scales. Each sub- discriminator includes two downblocks, strengthening the discriminators in discriminating between real and fake audio (generated audio). The proposed EMGVox-GAN was validated on a speech dataset (LJSpeech) and three EMG datasets (Silent Speech, CSL-EMG-Array, and XVoice_Speech_EMG). We have significantly enhanced speech quality, achieving a Mean Opinion Score (MOS) of 4.14 on our largest dataset. Additionally, the Word Error Rate (WER) was notably reduced from 47 % to 36 %, as defined in the state-of-the-art work, underscoring the improved clarity in the synthesized speech. This breakthrough offers a transformative shift in speech synthesis by utilizing silent EMG signals to generate intelligible, high-quality speech. Beyond the advancement in speech quality, the EMGVox-GAN's successful integration of EMG signals opens new possibilities for applications in assistive technology, human-computer interaction, and other domains where clear and efficient speech synthesis is crucial.

키워드

EMGVox-GAN; Electromyography; Speech synthesis; Silent speech; Multi-dataset strategy

제목: EMGVox-GAN: A transformative approach to EMG-based speech synthesis, enhancing clarity, and efficiency via extensive dataset utilization

저자: Sualiheen, Sara; Kim, Deok-Hwan

DOI: 10.1016/j.csl.2024.101754

발행일: 2025-06

유형: Article

저널명: Computer Speech and Language

권: 92