Cohort-Sensitive Labeling: An Effective Approach for Enhancing ASR Performance

  • Na, Jonghwan
  • Hasegawa-Johnson, Mark
  • Lee, Bowon
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

This paper proposes a cohort-sensitive labeling (CSL) for automatic speech recognition (ASR). CSL is a method that distinguishes data labels based on cohorts, allowing models to learn cohort-specific information. For evaluation, we applied CSL using gender information in the training data of LibriSpeech dataset. Experimental results demonstrate that the CSL-based approach outperforms methods without CSL, given sufficient training data. Specifically, our method achieved average word error rate reduction (WERR) of 1.81% on the LibriSpeech testclean and 5.76% on test-other datasets, when more than 100 hours of data were used for training. Moreover, on TIMIT and Common Voice test sets, it achieved WERR of up to 11.52% and 2.91%, respectively demonstrating its robustness and generalizability to unseen data. Additionally, the proposed method reached up to 97.21% accuracy in classifying the gender cohort, suggesting that ASR models trained with the CSL effectively leverage the cohort information.

키워드

cohort-sensitive labelingautomatic speech recognition
제목
Cohort-Sensitive Labeling: An Effective Approach for Enhancing ASR Performance
저자
Na, JonghwanHasegawa-Johnson, MarkLee, Bowon
DOI
10.1109/ICASSP49660.2025.10890600
발행일
2025
유형
Proceedings Paper
저널명
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings