基於二階段分類器之惡意流量偵測;Two-stage Classifier For Malicious Traffic Detection

NCUIR > School of Management at National Central University > Executive Master of Information Management > Electronic Thesis & Dissertation > Item 987654321/92775

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/92775

Title:	基於二階段分類器之惡意流量偵測;Two-stage Classifier For Malicious Traffic Detection
Authors:	林聖富;Lin, Sheng-Fu
Contributors:	資訊管理學系在職專班
Keywords:	網路入侵偵測;資訊獲利;極限隨機樹;ADASYN;TomekLinks;NIDS;Information Gain;Extra Trees;ADASYN;TomekLinks
Date:	2023-07-25
Issue Date:	2023-10-04 16:10:26 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著物聯網（Internet of Things, IoT）的迅速發展，我們面臨著越來越多的資訊安全威脅。為有效對抗這些威脅，機器學習已被廣泛應用於網路入侵偵測(Network Intrusion Detection, NIDS)。然而，面對這些龐大的入侵偵測數據，經常出現資料不平衡和特徵冗餘的問題，導致分類器在訓練過程中易於過度擬合，進而影響模型的效能與準確率。本研究提出了一種新穎的二階段分類器模型，可偵測二元和多元分類，該模型糅合了機器學習和集成學習的方法，並結合特徵選擇和資料平衡方法，以應對大規模網路流量。在第一階段，本研究比較六種機器學習方法的準確率和時間效率，最終選擇了決策樹（Decision Tree）作為分類器來識別正常和攻擊數據。在第二階段，本研究對第一階段預測為攻擊的數據進行攻擊類別的分類，利用資訊獲利（Information Gain）來選擇重要的特徵，並比較了三種集成學習方法和兩種資料平衡方法，實驗顯示，極限隨機樹(Extra Trees)和ADASYN+TomekLinks方法具高模型效能及時間效率，並優於SMOTE平衡方法。本研究在CIC-IDS2017和UNSW-NB15兩種不同的資料集上驗證了二階段分類器模型具有卓越的偵測效能，F1-Score分別可達到99.65%和79.70%，總訓練時間分別為171.47秒和11.95秒，相較於其他研究，本研究的模型在效能和時間效率表現更為出色。;As the Internet of Things(IoT) developes rapidly, people are facing the increasing number information security threats. As a result, machine learning has been widely applied in detecting network intrusion to effectively combat these threats. However, while facing this massive data for intrusion detection the problem such as data imbalance and feature redundancy has been provoked as the same time.These issues cause classifiers to overfit during the training process, subsequently affecting the efficiency and accuracy of the model. This study proposes a novel two-stage classifier model capable of detecting binary and multi-category classifications. The model incorporates both machine learning and ensemble learning methods, combined with feature selection and data balancing techniques, to address large-scale network traffic. In the first stage, we compared the accuracy and time efficiency of six machine learning methods, ultimately selecting the Decision Tree as the classifier to distinguish between normal and attack data. In the second stage, the data predicted as attacks in the first stage are classified into attack categories. Information Gain is used to select significant features, and three ensemble learning methods and two data balancing methods are compared. Experimental results indicate that the Extra Trees and ADASYN+TomekLinks methods provide high model efficiency and time efficiency, outperforming the SMOTE balancing method. This study validates the excellent detection efficiency of the two-stage classifier model on two different datasets, CIC-IDS2017 and UNSW-NB15, with F1-Scores reaching 99.65% and 79.70% respectively. The total training time is 171.47 seconds and 11.95 seconds, respectively. Compared to other research, the model in this study exhibits superior performance in both efficiency and time efficiency.
Appears in Collections:	[Executive Master of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	63	View/Open

社群 sharing

Loading...