Towards Scalable and Robust Multilingual ASR for Indian Languages with MixLoRA-Whisper

Park, Yeseul; Lee, Bowon

doi:10.1109/ASRU65441.2025.11434786

상세 보기

Towards Scalable and Robust Multilingual ASR for Indian Languages with MixLoRA-Whisper

Park, Yeseul;
Lee, Bowon

Citations

SCOPUS

0

초록

India exhibits extensive linguistic diversity, with many regional languages and dialects, yet current multilingual automatic speech recognition (ASR) models provide limited support, especially for low-income and rural populations who rely on spoken communication. We apply MixLoRA, a parameterefficient fine-tuning method proposed for large language models, to Whisper to improve ASR performance. MixLoRA employs multiple LoRA experts and dynamically selects the most relevant experts per token, enabling better modeling of linguistic variation. By fine-tuning only up to 25.03 % of the parameters on the RESPIN dataset, which covers eight Indian languages with 33 dialects, it achieves a 4.98 % character error rate (CER) on the read speech, yielding a 7.09 % relative CER reduction over the baseline. Performance improved across all languages in read speech and five in spontaneous speech. These results demonstrate that MixLoRA can effectively enhance ASR for low-resource, dialect-rich languages. © 2025 IEEE.

키워드

multilingual asr; parameter-efficient finetuning; speech recognition

제목: Towards Scalable and Robust Multilingual ASR for Indian Languages with MixLoRA-Whisper

저자: Park, Yeseul; Lee, Bowon

DOI: 10.1109/ASRU65441.2025.11434786

발행일: 2025

유형: Conference paper

저널명: ASRU 2025 - 2025 IEEE Automatic Speech Recognition and Understanding Workshop