Towards Scalable and Robust Multilingual ASR for Indian Languages with MixLoRA-Whisper

Citations

SCOPUS

0

초록

India exhibits extensive linguistic diversity, with many regional languages and dialects, yet current multilingual automatic speech recognition (ASR) models provide limited support, especially for low-income and rural populations who rely on spoken communication. We apply MixLoRA, a parameterefficient fine-tuning method proposed for large language models, to Whisper to improve ASR performance. MixLoRA employs multiple LoRA experts and dynamically selects the most relevant experts per token, enabling better modeling of linguistic variation. By fine-tuning only up to 25.03 % of the parameters on the RESPIN dataset, which covers eight Indian languages with 33 dialects, it achieves a 4.98 % character error rate (CER) on the read speech, yielding a 7.09 % relative CER reduction over the baseline. Performance improved across all languages in read speech and five in spontaneous speech. These results demonstrate that MixLoRA can effectively enhance ASR for low-resource, dialect-rich languages. © 2025 IEEE.

키워드

multilingual asrparameter-efficient finetuningspeech recognition
제목
Towards Scalable and Robust Multilingual ASR for Indian Languages with MixLoRA-Whisper
저자
Park, YeseulLee, Bowon
DOI
10.1109/ASRU65441.2025.11434786
발행일
2025
유형
Conference paper
저널명
ASRU 2025 - 2025 IEEE Automatic Speech Recognition and Understanding Workshop