Automatic Pruning and Quality Assurance of Object Detection Datasets for Autonomous Driving

Citations

WEB OF SCIENCE

2
Citations

SCOPUS

4

초록

Large amounts of high-quality data are required to train artificial intelligence (AI) models; however, curating such data through human intervention remains cumbersome, time-consuming, and error-prone. In particular, erroneous annotations and statistical imbalances in object detection datasets can significantly degrade model performance in real-world autonomous driving scenarios. This study proposes an automated pruning framework and quality assurance strategy for 2D object detection datasets to address these issues. The framework is composed of two stages: (1) noisy label identification and deletion based on labeling scores derived from the inference results of multiple object detection models, and (2) statistical distribution whitening based on class and bounding box size diversity metrics. The proposed method was designed in accordance with the ISO/IEC 25012 data quality standards to ensure data consistency, accuracy, and completeness. Experiments were conducted on widely used autonomous driving datasets, including KITTI, Waymo, nuScenes, and large-scale publicly available datasets from South Korea. An automated data pruning process was employed to eliminate anomalous and redundant samples, resulting in a more reliable and compact dataset for model training. The results demonstrate that the proposed method substantially reduces the amount of training data required, while enhancing the detection performance and minimizing manual inspection efforts.

키워드

dataset pruningdataset whiteningquality assurancenoisy labelstraining dataARTIFICIAL-INTELLIGENCEVEHICLES
제목
Automatic Pruning and Quality Assurance of Object Detection Datasets for Autonomous Driving
저자
Kim, KanaKakani, VijayKim, Hakil
DOI
10.3390/electronics14091882
발행일
2025-05
유형
Article
저널명
Electronics (Basel)
14
9