Glottal-Aware Speech-to-EMG Synthesis via Sparse-Adaptive Fusion of Speech Units

Ullah, Shan; Nasralla, Moustafa M.; Kim, Deok-Hwan

doi:10.1109/TIM.2026.3659639

상세 보기

Glottal-Aware Speech-to-EMG Synthesis via Sparse-Adaptive Fusion of Speech Units

Ullah, Shan;
Nasralla, Moustafa M.;
Kim, Deok-Hwan

Citations

WEB OF SCIENCE

1

Citations

SCOPUS

1

초록

Silent speech decoding utilizes electromyography (EMG) signals to generate speech without vocalization. However, acquiring EMG datasets for silent speech is challenging because of factors such as electrode configuration (unipolar versus bipolar), inconsistent electrode placement, intersession and interdevice variability, and overall discomfort of the subjects. An alternative approach to address these challenges considers the inverse problem of speech-to-EMG synthesis, thereby enabling the generation of EMG signals from available speech data. In this article, we propose a novel multimodal fusion framework that integrates electroglottography (EGG) and speech signals to enhance the representation of soft speech units (SSUs). We introduce a glottal-aware EGG encoder based on a ConvNeXt-LSTM architecture, which extracts key glottal features such as the derivative EGG, glottal opening/closing points, and peak-to-peak amplitude differences. We further optimize EMG feature extraction using a ConvNext-inspired EMG encoder, replacing traditional ResNet blocks and transformer modules with FNet, which leverages Fourier token mixing for improved feature learning. To enhance EMG synthesis, we introduce a sparse-fusion technique that iteratively fuses stronger neural nodes, allowing weaker nodes to adapt in later iterations. Our synthesis process is driven by a speech-to-EMG generative adversarial networks (STE-GANs)-based generative model, incorporating our glottal-aware fusion strategy, optimized EGG encoder, and enhanced EMG encoder to improve EMG signal quality and realism. For evaluation, we generated our novel multimodal dataset containing synchronized EGG, EMG, and speech signals to validate our methodology and compare our results with publicly available EMG-voice datasets. Extensive experiments and comparative studies with state-of-the-art (SOTA) algorithms confirm that our method significantly improves EMG synthesis quality and speech feature transduction.

키워드

Electromyography; Feature extraction; Electrodes; Transformers; Speech synthesis; Production; Inverse problems; Decoding; Vocoders; Synchronization; Electroglottography (EGG); electromyography (EMG); multimodal fusion; speech units; speech-to-EMG

제목: Glottal-Aware Speech-to-EMG Synthesis via Sparse-Adaptive Fusion of Speech Units

저자: Ullah, Shan; Nasralla, Moustafa M.; Kim, Deok-Hwan

DOI: 10.1109/TIM.2026.3659639

발행일: 2026

유형: Article

저널명: IEEE Transactions on Instrumentation and Measurement

권: 75