Comparative Analysis of Automatic Speech Recognition Fine-Tuning Strategies for Speech From Cochlear Implant Users

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Although automatic speech recognition technology has become widespread, it still exhibits limited performance when processing speech from cochlear implant (CI) users. This limitation serves as a barrier that hinders CI users' accessibility to digital technology. To address this issue, a comparative study of fine-tuning strategies was conducted to effectively adapt Whisper, a general-purpose speech recognition model, to CI users' speech. Specifically, the performance of full fine-tuning, selective fine-tuning, adapter, and LoRA were evaluated based on Korean CI user's speech dataset. The experimental results showed that all the fine-tuning approaches improved recognition performance compared to the baseline Whisper model. Notably, LoRA-encoder approach, which involved training only 2.15% of the total parameters, achieved the best performance with a character error rate of 11.57%, demonstrating superior performance and efficiency. Furthermore, strategies that fine-tuned only the encoder consistently showed higher performance than those that adjusted the decoder, confirming that the encoder's role is crucial in modeling the unique acoustic characteristics of CI users' speech.

키워드

Automatic speech recognitioncochlear implantparameter-efficient learningparameter-efficient learningWhisperWhisperWhisper
제목
Comparative Analysis of Automatic Speech Recognition Fine-Tuning Strategies for Speech From Cochlear Implant Users
저자
Yoon, SeojinKim, HyunjiKim, KyusungLee, Sangmin
DOI
10.1109/LSP.2025.3640524
발행일
2026
유형
Article
저널명
IEEE Signal Processing Letters
33
페이지
236 ~ 240