발화 속도와 휴지 구간 길이를 사용한 방언 분류

Dialect classification based on the speed and the pause of speech utterances

초록

In this paper, we propose an approach for dialect classification based on the speed and pause of speech utterances as well as the age and gender of the speakers. Dialect classification is one of the important techniques for speech analysis. For example, an accurate dialect classification model can potentially improve the performance of speaker or speech recognition. According to previous studies, research based on deep learning using Mel-Frequency Cepstral Coefficients (MFCC) features has been the dominant approach. We focus on the acoustic differences between regions and conduct dialect classification based on the extracted features derived from the differences. In this paper, we propose an approach of extracting underexplored additional features, namely the speed and the pauses of speech utterances along with the metadata including the age and the gender of the speakers. Experimental results show that our proposed approach results in higher accuracy, especially with the speech rate feature, compared to the method only using the MFCC features. The accuracy improved from 91.02% to 97.02% compared to the previous method that only used MFCC features, by incorporating all the proposed features in this paper.

키워드

dialect classificationfeature extractionlow resource conditions
제목
발화 속도와 휴지 구간 길이를 사용한 방언 분류
제목 (타언어)
Dialect classification based on the speed and the pause of speech utterances
저자
나종환이보원
DOI
10.13064/KSSS.2023.15.2.043
발행일
2023-06
유형
Y
저널명
말소리와 음성과학
15
2
페이지
43 ~ 51