Weighted support vector machine for extremely imbalanced data

Citations

WEB OF SCIENCE

6
Citations

SCOPUS

6

초록

Based on an asymptotically optimal weighted support vector machine (SVM) that introduces label shift, a systematic procedure is derived for applying oversampling and weighted SVM to extremely imbalanced datasets with a cluster-structured positive class. This method formalizes three intuitions: (i) oversampling should reflect the structure of the positive class; (ii) weights should account for both the imbalance and oversampling ratios; (iii) synthetic samples should carry less weight than the original samples. The proposed method generates synthetic samples from the estimated positive class distribution using a Gaussian mixture model. To prevent overfitting to excessive synthetic samples, different misclassification penalties are assigned to the original positive class, synthetic positive class, and negative class. The proposed method is numerically validated through simulations and an analysis of Republic of Korea Army artillery training data.

키워드

Bayes ruleCost-sensitive learningGaussian mixtureImbalanced classificationLabel shiftOversamplingWeighted support vector machineMODELSEMCLASSIFICATIONGUARANTEESCLUSTERSNUMBER
제목
Weighted support vector machine for extremely imbalanced data
저자
Mun, JongminBang, SungwanKim, Jaeoh
DOI
10.1016/j.csda.2024.108078
발행일
2025-03
유형
Article
저널명
Computational Statistics and Data Analysis
203