Toward Few-Shot Multimodal Voice Pathology Classification on an Embedded Edge Device

Kanagachalam, Srinidhi; Kim, Deok-Hwan

doi:10.1109/ACCESS.2025.3606943

상세 보기

Toward Few-Shot Multimodal Voice Pathology Classification on an Embedded Edge Device

Kanagachalam, Srinidhi;
Kim, Deok-Hwan

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

1

초록

Voice pathologies hinder effective communication, often leading to social isolation and reduced quality of life. Automatic voice pathology classification enables identification of vocal disorders by analyzing acoustic signals from the affected individuals. However, existing methods primarily rely on machine learning approaches such as Support Vector Machines (SVMs), which often struggle to generalize effectively to unseen pathologies. Our study introduces PathoEAI, a deep learning-based multimodal voice pathology classification framework using few-shot learning designed for edge deployment, addressing the growing demand for accessible digital healthcare solutions. PathoEAI classifies five pathological voice categories alongside the healthy voice, resulting in a six-class classification task using voice and electroglottographic (EGG) signals, which are then fused using a temporal cross-attention block. It further employs a metric-based few-shot learning approach using prototypical networks, that mitigates overfitting by learning a robust representation space where each class is represented by a prototype, enabling distance-based classification and improved generalization to unseen categories with only a few examples. Evaluations on the publicly available Saarbrucken Voice Database (SVD) under varying shot settings achieve 72.41% accuracy in a 2-way classification task with 10 shots. Deployment on the NVIDIA Jetson Xavier using TensorRT optimizations results in approximately 3.61 times faster inference compared to a standard server. The proposed framework supports personalized and accessible healthcare through real-time, non-invasive, and early diagnosis of voice pathologies, thereby improving overall quality of life.

키워드

Pathology; Real-time systems; Medical services; Few shot learning; Acoustics; Metalearning; Feature extraction; Electronic healthcare; Accuracy; Robustness; Voice pathology; multimodal fusion; classification; few-shot learning; edge computing; ehealth; HEALTH-CARE; INTELLIGENCE; INTERNET; THINGS

제목: Toward Few-Shot Multimodal Voice Pathology Classification on an Embedded Edge Device

저자: Kanagachalam, Srinidhi; Kim, Deok-Hwan

DOI: 10.1109/ACCESS.2025.3606943

발행일: 2025

유형: Article

저널명: IEEE Access

권: 13

페이지: 157442 ~ 157454