Automated Korean-English Bilingual Highlighted Text Extraction Using HSV Segmentation

  • Lee, Seungjun
  • Kim, Yeongjin
  • Baek, Minhyuk
  • Jang, Jaehyeok
  • Kim, Ajin
  • 외 5명
Citations

SCOPUS

0

초록

This paper presents an automated system for extracting highlighted text from Korean-English documents using HSV-based color segmentation and OCR. By integrating HSV segmentation with linguistic correction methods, evaluation on a 600 -image dataset achieved mIoU of 0.8222 for highlight detection and 95.30% character-level accuracy (4.70% CER) for OCR. These results demonstrate that the combination of HSV segmentation and language-specific post-processing enables accurate and robust recovery of color-highlighted text for document digitization and analysis. © 2026 IEEE.

키워드

BilingualHighlighted TextHSVMulti-PSMOCRTesseract
제목
Automated Korean-English Bilingual Highlighted Text Extraction Using HSV Segmentation
저자
Lee, SeungjunKim, YeongjinBaek, MinhyukJang, JaehyeokKim, AjinKim, YerinPark, SeonghunChoi, SeungyunKim, NamjoonLee, Hyukjae
DOI
10.1109/ICEIC69189.2026.11386274
발행일
2026
유형
Conference paper
저널명
2026 International Conference on Electronics, Information, and Communication, ICEIC 2026