A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data

Bang, Sungwan; Kim, Jaeoh

doi:10.5351/KJAS.2022.35.2.177

상세 보기

A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data

Bang, Sungwan;
Kim, Jaeoh

Citations

WEB OF SCIENCE

0

초록

The support vector machine (SVM) has been successfully applied to various classification areas with a high level of classification accuracy. However, it is infeasible to use the SVM in analyzing massive data because of its significant computational problems. When analyzing imbalanced data with different class sizes, furthermore, the classification accuracy of SVM in minority class may drop significantly because its classifier could be biased toward the majority class. To overcome such a problem, we propose the DOC-SVM method, which uses divide-oversampling and conquers techniques. The proposed DOC-SVM divides the majority class into a few subsets and applies an oversampling technique to the minority class in order to produce the balanced subsets. And then the DOC-SVM obtains the final classifier by aggregating all SVM classifiers obtained from the balanced subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.

키워드

divide and conquer; imbalanced data; massive data; oversampling; support vector machine; QUANTILE REGRESSION; SMOTE; CLASSIFICATION

제목: A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data

저자: Bang, Sungwan; Kim, Jaeoh

DOI: 10.5351/KJAS.2022.35.2.177

발행일: 2022-04

유형: Article

저널명: 응용통계연구

권: 35

호: 2

페이지: 177 ~ 188