상세 보기
초록
Increasing volume and speed of internet traffic fosters unprecedented opportunity for malicious attackers. This in turn creates challenges for network intrusion detection systems (NIDSs) whose job is to detect intrusive (i.e., malicious) network traffic. Majority of current solutions exploit flow records which contain information regarding the flow (e.g., number of packets, avg. inter-arrival time). Hence, most of the NIDS solutions exploit tree-based ML models such as Decision Tree and Random Forest due to the tabular form of a flow record. However, recently Gradient Boosting Machine methods such as CatBoost has shown their superior performance over traditional tree-based solutions on tabular datasets such as in Kaggle competitions. In this work we explore the applicability of CatBoost for network intrusion detection task. Further, we demonstrate the performance gain achieved by addressing data imbalance. Our experimental comparisons show that addressing data imbalance with simple over-sampling technique can provide significant performance boost from 88.84% to 92.41% accuracy improvement in the case of CatBoost. Results also suggest CatBoost classifier (92.41%) outperforms Decision Tree and Random Forest (88.34% and 89.88%) in term of balanced accuracy on CIC-IDS-2018 dataset. © 2021, Korean Institute of Communications and Information Sciences. All rights reserved.
키워드
- 제목
- CatBoost-Based Network Intrusion Detection on Imbalanced CIC-IDS-2018 Dataset
- 저자
- Jumabek, Alikhanov; Yang, Seungsam; Noh, Youngtae
- 발행일
- 2021
- 유형
- Article
- 저널명
- 한국통신학회논문지
- 권
- 46
- 호
- 12
- 페이지
- 2191 ~ 2197