Weighted support vector machine for extremely imbalanced data

Mun, Jongmin; Bang, Sungwan; Kim, Jaeoh

doi:10.1016/j.csda.2024.108078

상세 보기

Weighted support vector machine for extremely imbalanced data

Mun, Jongmin;
Bang, Sungwan;
Kim, Jaeoh

Citations

WEB OF SCIENCE

6

Citations

SCOPUS

6

초록

Based on an asymptotically optimal weighted support vector machine (SVM) that introduces label shift, a systematic procedure is derived for applying oversampling and weighted SVM to extremely imbalanced datasets with a cluster-structured positive class. This method formalizes three intuitions: (i) oversampling should reflect the structure of the positive class; (ii) weights should account for both the imbalance and oversampling ratios; (iii) synthetic samples should carry less weight than the original samples. The proposed method generates synthetic samples from the estimated positive class distribution using a Gaussian mixture model. To prevent overfitting to excessive synthetic samples, different misclassification penalties are assigned to the original positive class, synthetic positive class, and negative class. The proposed method is numerically validated through simulations and an analysis of Republic of Korea Army artillery training data.

키워드

Bayes rule; Cost-sensitive learning; Gaussian mixture; Imbalanced classification; Label shift; Oversampling; Weighted support vector machine; MODELS; EM; CLASSIFICATION; GUARANTEES; CLUSTERS; NUMBER

제목: Weighted support vector machine for extremely imbalanced data

저자: Mun, Jongmin; Bang, Sungwan; Kim, Jaeoh

DOI: 10.1016/j.csda.2024.108078

발행일: 2025-03

유형: Article

저널명: Computational Statistics and Data Analysis

권: 203