Toward Few-Shot Multimodal Voice Pathology Classification on an Embedded Edge Device

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

1

초록

Voice pathologies hinder effective communication, often leading to social isolation and reduced quality of life. Automatic voice pathology classification enables identification of vocal disorders by analyzing acoustic signals from the affected individuals. However, existing methods primarily rely on machine learning approaches such as Support Vector Machines (SVMs), which often struggle to generalize effectively to unseen pathologies. Our study introduces PathoEAI, a deep learning-based multimodal voice pathology classification framework using few-shot learning designed for edge deployment, addressing the growing demand for accessible digital healthcare solutions. PathoEAI classifies five pathological voice categories alongside the healthy voice, resulting in a six-class classification task using voice and electroglottographic (EGG) signals, which are then fused using a temporal cross-attention block. It further employs a metric-based few-shot learning approach using prototypical networks, that mitigates overfitting by learning a robust representation space where each class is represented by a prototype, enabling distance-based classification and improved generalization to unseen categories with only a few examples. Evaluations on the publicly available Saarbrucken Voice Database (SVD) under varying shot settings achieve 72.41% accuracy in a 2-way classification task with 10 shots. Deployment on the NVIDIA Jetson Xavier using TensorRT optimizations results in approximately 3.61 times faster inference compared to a standard server. The proposed framework supports personalized and accessible healthcare through real-time, non-invasive, and early diagnosis of voice pathologies, thereby improving overall quality of life.

키워드

PathologyReal-time systemsMedical servicesFew shot learningAcousticsMetalearningFeature extractionElectronic healthcareAccuracyRobustnessVoice pathologymultimodal fusionclassificationfew-shot learningedge computingehealthHEALTH-CAREINTELLIGENCEINTERNETTHINGS
제목
Toward Few-Shot Multimodal Voice Pathology Classification on an Embedded Edge Device
저자
Kanagachalam, SrinidhiKim, Deok-Hwan
DOI
10.1109/ACCESS.2025.3606943
발행일
2025
유형
Article
저널명
IEEE Access
13
페이지
157442 ~ 157454