dc.description.abstract | As the Internet of Things(IoT) developes rapidly, people are facing the increasing number information security threats. As a result, machine learning has been widely applied in detecting network intrusion to effectively combat these threats. However, while facing this massive data for intrusion detection the problem such as data imbalance and feature redundancy has been provoked as the same time.These issues cause classifiers to overfit during the training process, subsequently affecting the efficiency and accuracy of the model. This study proposes a novel two-stage classifier model capable of detecting binary and multi-category classifications. The model incorporates both machine learning and ensemble learning methods, combined with feature selection and data balancing techniques, to address large-scale network traffic. In the first stage, we compared the accuracy and time efficiency of six machine learning methods, ultimately selecting the Decision Tree as the classifier to distinguish between normal and attack data. In the second stage, the data predicted as attacks in the first stage are classified into attack categories. Information Gain is used to select significant features, and three ensemble learning methods and two data balancing methods are compared. Experimental results indicate that the Extra Trees and ADASYN+TomekLinks methods provide high model efficiency and time efficiency, outperforming the SMOTE balancing method. This study validates the excellent detection efficiency of the two-stage classifier model on two different datasets, CIC-IDS2017 and UNSW-NB15, with F1-Scores reaching 99.65% and 79.70% respectively. The total training time is 171.47 seconds and 11.95 seconds, respectively. Compared to other research, the model in this study exhibits superior performance in both efficiency and time efficiency. | en_US |