A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data

Citations

WEB OF SCIENCE

0

초록

The support vector machine (SVM) has been successfully applied to various classification areas with a high level of classification accuracy. However, it is infeasible to use the SVM in analyzing massive data because of its significant computational problems. When analyzing imbalanced data with different class sizes, furthermore, the classification accuracy of SVM in minority class may drop significantly because its classifier could be biased toward the majority class. To overcome such a problem, we propose the DOC-SVM method, which uses divide-oversampling and conquers techniques. The proposed DOC-SVM divides the majority class into a few subsets and applies an oversampling technique to the minority class in order to produce the balanced subsets. And then the DOC-SVM obtains the final classifier by aggregating all SVM classifiers obtained from the balanced subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.

키워드

divide and conquerimbalanced datamassive dataoversamplingsupport vector machineQUANTILE REGRESSIONSMOTECLASSIFICATION
제목
A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data
저자
Bang, SungwanKim, Jaeoh
DOI
10.5351/KJAS.2022.35.2.177
발행일
2022-04
유형
Article
저널명
응용통계연구
35
2
페이지
177 ~ 188