합성데이터 생성 도구 synthpop에 대한 소개 및 실증적 고찰 A comprehensive and empirical review of the synthpop for synthetic data generation

Citations

WEB OF SCIENCE

1

초록

This paper reviews the synthpop library in R, a versatile tool for synthetic data generation. To generate synthetic data, the synthpop fits sequential models to the original data, and the synthesized data is obtained from the estimated models. The synthpop provides multiple options for baseline models, including linear regression, classification and regression tree, and random forest. Several models available in the synthpop can reproduce original observation values using the quantile transformation or by sampling original data points, even for continuous data. In this paper, we provide a numerical example illustrating the potential disclosure risk when using synthpop. Our real data example shows that the models available in the synthpop should be carefully selected when the original data contains sensitive categorical or continuous variables.

키워드

disclosure riskdata privacysequential modelssynthetic data generatonsynthpopUTILITY
제목
합성데이터 생성 도구 synthpop에 대한 소개 및 실증적 고찰 A comprehensive and empirical review of the synthpop for synthetic data generation
저자
Lim, JohanKim, SeungkyuYu, Donghyeon
DOI
10.5351/KJAS.2025.38.2.299
발행일
2025-04
유형
Article
저널명
응용통계연구
38
2
페이지
299 ~ 308