Multiaccent EMG-to-Speech Optimized Transduction With PerFL and MAML Adaptations

Ullah, Shan; Kim, Deok-Hwan

doi:10.1109/TIM.2024.3449948

상세 보기

Multiaccent EMG-to-Speech Optimized Transduction With PerFL and MAML Adaptations

Ullah, Shan;
Kim, Deok-Hwan

Citations

WEB OF SCIENCE

7

Citations

SCOPUS

7

초록

Silent speech voicing enables individuals with speech impairments to communicate solely through facial muscle movements, bypassing the need for vocalization. Typically, electromyography (EMG) is utilized in conjunction with voice signals from individuals with normal speech for training purposes. Existing studies are targeting single accent using a single acquisition device, ignoring multiple accents from diverse ethnic backgrounds which can pose challenges in developing generalized and adaptive solutions. To address this, we propose a comprehensive approach consisting of the following: 1) a multiaccent EMG-to-speech silent voicing dataset; 2) an optimized transduction model (EMG-to-speech features); 3) a model-agnostic meta-learning (MAML) approach to adapt across cross-accented data; and 4) a personalized federated learning (PerFL) solution that utilizes MAML initialization to enhance global model convergence. Our novel transduction model incorporates three key elements: 1) convolution layers with a Squeeze-and-Excitation network to enhance channel-wise interdependencies (feature recalibration); 2) a gating multilayer perceptron to enhance global context awareness by linear projections along channel dimensions; and 3) transformers that learn temporal features across time series (EMG). We validated our novel algorithm using publicly available and proprietary (from our research laboratory) datasets. To simulate real-world conditions, a proprietary dataset was generated using three different biosignal devices, yielding heterogeneous data with 1370 utterances involving eight subjects with three distinct accents. Our proposed transduction model outperformed traditional methods, with 1.3%-3.5% improvements in the word error rate (WER) on the public dataset. Moreover, we studied the impact of two different MAML variants and their impact on PerFL initialization. Detailed results, encompassing various performance metrics such as confusability, accuracy, character-error-rate (CER), and WER, are presented for both public and proprietary datasets.

키워드

Electromyography (EMG); EMG-to-speech; meta-learning; personalized federated learning (PerFL); silent speech interface; transformers; voice synthesis; SILENT SPEECH

제목: Multiaccent EMG-to-Speech Optimized Transduction With PerFL and MAML Adaptations

저자: Ullah, Shan; Kim, Deok-Hwan

DOI: 10.1109/TIM.2024.3449948

발행일: 2024

유형: Article

저널명: IEEE Transactions on Instrumentation and Measurement

권: 73