Multiaccent EMG-to-Speech Optimized Transduction With PerFL and MAML Adaptations

Citations

WEB OF SCIENCE

7
Citations

SCOPUS

7

초록

Silent speech voicing enables individuals with speech impairments to communicate solely through facial muscle movements, bypassing the need for vocalization. Typically, electromyography (EMG) is utilized in conjunction with voice signals from individuals with normal speech for training purposes. Existing studies are targeting single accent using a single acquisition device, ignoring multiple accents from diverse ethnic backgrounds which can pose challenges in developing generalized and adaptive solutions. To address this, we propose a comprehensive approach consisting of the following: 1) a multiaccent EMG-to-speech silent voicing dataset; 2) an optimized transduction model (EMG-to-speech features); 3) a model-agnostic meta-learning (MAML) approach to adapt across cross-accented data; and 4) a personalized federated learning (PerFL) solution that utilizes MAML initialization to enhance global model convergence. Our novel transduction model incorporates three key elements: 1) convolution layers with a Squeeze-and-Excitation network to enhance channel-wise interdependencies (feature recalibration); 2) a gating multilayer perceptron to enhance global context awareness by linear projections along channel dimensions; and 3) transformers that learn temporal features across time series (EMG). We validated our novel algorithm using publicly available and proprietary (from our research laboratory) datasets. To simulate real-world conditions, a proprietary dataset was generated using three different biosignal devices, yielding heterogeneous data with 1370 utterances involving eight subjects with three distinct accents. Our proposed transduction model outperformed traditional methods, with 1.3%-3.5% improvements in the word error rate (WER) on the public dataset. Moreover, we studied the impact of two different MAML variants and their impact on PerFL initialization. Detailed results, encompassing various performance metrics such as confusability, accuracy, character-error-rate (CER), and WER, are presented for both public and proprietary datasets.

키워드

Electromyography (EMG)EMG-to-speechmeta-learningpersonalized federated learning (PerFL)silent speech interfacetransformersvoice synthesisSILENT SPEECH
제목
Multiaccent EMG-to-Speech Optimized Transduction With PerFL and MAML Adaptations
저자
Ullah, ShanKim, Deok-Hwan
DOI
10.1109/TIM.2024.3449948
발행일
2024
유형
Article
저널명
IEEE Transactions on Instrumentation and Measurement
73