상세 보기
초록
This paper presents a comparative evaluation of Korean dialect-to-standard speech translation using a cascade pipeline that combines Whisper-based speech-to-text (STT), KoBART-based neural machine translation (NMT), and text-to-speech (TTS). For four Korean dialects—Gangwon, Gyeongsang, Jeju, and Jeolla—we fine-tuned region-specific STT and NMT models using the AI Hub corpus and conducted module-wise and pipeline-level analyses. The STT module achieved WER (Word Error Rate) below 10% and CER (Character Error Rate) below 5% across all regions. Oracle–pipeline(using reference transcripts as NMT input) comparison showed an average degradation of 9.20 BLEU points when STT outputs replaced reference transcripts, with the largest drop observed for Jeju. These results suggest that front-end recognition errors are one important factor affecting downstream translation performance. Sentence-length analysis further showed that short utterances containing concentrated dialectal endings were particularly challenging, while error-type analysis revealed distinct regional patterns, including high substitution rates for Jeju and high near-match rates for Jeolla. For speech synthesis, edge-tts outperformed the offline pyttsx3 baseline in both non-intrusive objective metrics and a small-scale listening test. Overall, the results demonstrate the feasibility of Korean dialect-to-standard speech translation and provide a practical diagnostic framework for analyzing bottlenecks in low-resource dialect speech translation.
키워드
- 제목
- STT-NMT-TTS 파이프라인을 활용한 한국어 방언-표준어 음성 번역 시스템 설계 및 성능 분석
- 제목 (타언어)
- Design and Performance Evaluation of a Korean Dialect-to-Standard Speech Translator Using an STT-NMT-TTS Pipeline
- 저자
- 곽대혁; 한영신; 이종식
- 발행일
- 2026-04
- 유형
- Y
- 저널명
- 멀티미디어학회논문지
- 권
- 29
- 호
- 4
- 페이지
- 720 ~ 735