상세 보기
합성데이터 생성 도구 synthpop에 대한 소개 및 실증적 고찰 A comprehensive and empirical review of the synthpop for synthetic data generation
- Lim, Johan;
- Kim, Seungkyu;
- Yu, Donghyeon
WEB OF SCIENCE
1초록
This paper reviews the synthpop library in R, a versatile tool for synthetic data generation. To generate synthetic data, the synthpop fits sequential models to the original data, and the synthesized data is obtained from the estimated models. The synthpop provides multiple options for baseline models, including linear regression, classification and regression tree, and random forest. Several models available in the synthpop can reproduce original observation values using the quantile transformation or by sampling original data points, even for continuous data. In this paper, we provide a numerical example illustrating the potential disclosure risk when using synthpop. Our real data example shows that the models available in the synthpop should be carefully selected when the original data contains sensitive categorical or continuous variables.
키워드
- 제목
- 합성데이터 생성 도구 synthpop에 대한 소개 및 실증적 고찰 A comprehensive and empirical review of the synthpop for synthetic data generation
- 저자
- Lim, Johan; Kim, Seungkyu; Yu, Donghyeon
- 발행일
- 2025-04
- 유형
- Article
- 저널명
- 응용통계연구
- 권
- 38
- 호
- 2
- 페이지
- 299 ~ 308