使用PSO優化XGBoost模型於暗網流量偵測之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：44

、訪客IP：3.142.171.111

姓名

張育彬(Yu-Ping Chang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用PSO優化XGBoost模型於暗網流量偵測之研究
(A Study of Darknet Traffic Detection Based on PSO-XGBoost Model)

相關論文

★ 無線行動隨意網路上穩定品質服務路由機制之研究	★ 應用多重移動式代理人之網路管理系統
★ 應用移動式代理人之網路協同防衛系統	★ 鏈路狀態資訊不確定下QoS路由之研究
★ 以訊務觀察法改善光突發交換技術之路徑建立效能	★ 感測網路與競局理論應用於舒適性空調之研究
★ 以搜尋樹為基礎之無線感測網路繞徑演算法	★ 基於無線感測網路之行動裝置輕型定位系統
★ 多媒體導覽玩具車	★ 以Smart Floor為基礎之導覽玩具車
★ 行動社群網路服務管理系統－應用於發展遲緩兒家庭	★ 具位置感知之穿戴式行動廣告系統
★ 調適性車載廣播	★ 車載網路上具預警能力之車輛碰撞避免機制
★ 應用於無線車載網路上之合作式交通資訊傳播機制以改善車輛擁塞	★ 智慧都市中應用車載網路以改善壅塞之調適性虛擬交通號誌

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來隨著網路技術的蓬勃發展，網路已經深深地融入了我們的生活。然而隨著洋蔥路由器（The onion router，Tor）和虛擬私人網路（Virtual Private Network，VPN）等技術的出現，暗網逐漸形成。為了有效的抵禦暗網的活動，會使用機器學習或深度學習技術在入侵檢測系統（Intrusion Detection System,IDS）上來偵測惡意流量，由於模型超參數空間的複雜性與輸入資料的特徵數量眾多，若依人工方式去手動調整超參數以及進行特徵選擇，將導致高昂的嘗試成本，如何有效的替模型尋找一個較佳的模型超參數與適合的特徵子集合將是一個挑戰。
本論文為了替模型尋找一個較佳的超參數配置並且減少模型的輸入特徵，提出Particle Swarm Optimization - eXtreme Gradient Boosting（PSO-XGB）方法用於建立暗網流量分類模型，該方法利用粒子群演算法替XGBoost模型尋找模型的最佳超參數配置，進而優化模型準確度，並且替輸入資料進行特徵選擇，粒子可以將特徵收斂到較優的特徵子集合以降低模型預測時間，本論文在CIC-Darknet2020資料集的實驗結果中，於Layer-1暗網流量類型分類有著98.42%的F1-score，可以將特徵數量由81個減少至43個，並減少23.52%的預測時間；在Layer-2流量的服務應用類型分類有91.28%的F1-score，可以將特徵數量由81個減少至43個，並減少9.27%的預測時間。PSO-XGB在兩個Layer上相較於Bagging、Random Forest以及CNN都有著更高的準確度，相比手動設超參數的XGBoost模型分別可以提升5.2%與6.78%的F1-score，雖然本論文提出之方法能有效提升模型準確度，但須於模型訓練階段花費較多的訓練時間，因此模型的準確度與訓練時間兩者將需要根據需求做出取捨。

摘要(英)

In recent years, with the rapid development of internet technology, the internet has deeply integrated into our lives. However, with the emergence of technologies such as The Onion Router (Tor) and Virtual Private Network (VPN), the darknet has gradually formed. To effectively counter the activities on the dark web, machine learning or deep learning techniques are used in Intrusion Detection System (IDS) to detect malicious traffic. Due to the complexity of the model′s hyperparameter space and the large number of input data features, manually adjusting the hyperparameters and performing feature selection would lead to high trial costs. Finding a better set of model hyperparameters and suitable feature subsets for the model effectively will be a challenge.
We propose Particle Swarm Optimization - eXtreme Gradient Boosting（PSO-XGB）method for building a darknet traffic classification model, aiming to find better hyperparameter configurations for the model and reduce the input features. The method utilizes the Particle Swarm Optimization algorithm to search for the optimal hyperparameter configuration for the XGBoost model, thereby optimizing the model′s accuracy. It also performs feature selection on the input data, allowing the particles to converge on a superior subset of features to reduce the model′s prediction time.In the experiments conducted on the CIC-Darknet2020 dataset, this paper achieves an F1-score of 98.42% for the classification of Layer-1 darknet traffic types, reducing the number of features from 81 to 43 and decreasing the prediction time by 23.52%. For the classification of Layer-2 traffic service applications, an F1-score of 91.28% is obtained, reducing the number of features from 81 to 43 and decreasing the prediction time by 9.27%.PSO-XGB outperforms Bagging, Random Forest, and CNN in terms of accuracy on both layers. Compared to manually setting hyperparameters for the XGBoost model, PSO-XGB achieves an improvement of 5.2% and 5.78% in F1-score, respectively. Although the proposed method effectively improves the model′s accuracy, it requires more training time during the model training phase. Therefore, the trade-off between model accuracy and training time needs to be considered based on specific requirements.

關鍵字(中)

★ 粒子群演算法
★ 入侵檢測系統
★ 流量分類
★ 模型優化
★ 暗網

關鍵字(英)

★ Particle Swarm Optimization
★ Intrusion Detection System
★ Traffic Classification
★ Model Optimization
★ Darknet

論文目次

摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 viii
表目錄 x
第一章緒論 1
1.1. 概要 1
1.2. 研究動機 2
1.3. 研究目的 3
1.4. 章節架構 3
第二章背景知識與相關研究 4
2.1. 暗網 4
2.1.1. The onion router 5
2.1.2. Virtual Private Network 6
2.2. 入侵檢測系統 7
2.2.1. Packet-based Intrusion Detection 8
2.2.2. Flow-based Intrusion Detection 9
2.3. 極限梯度提升 9
2.4. 粒子群演算法 11
2.5. 相關研究 12
第三章研究方法 15
3.1. 系統架構與設計 15
3.2. 系統運作流程與實作 17
3.2.1. 暗網流量蒐集 17
3.2.2. 資料前處理 18
3.2.3. 粒子群優化 22
3.2.4. 模型訓練 26
3.3. 系統環境 30
第四章實驗與討論 32
4.1. 情境一：XGBoost暗網流量分類成效與資料前處理比較 32
4.1.1. 實驗一：XGBoost於暗網Layer-1上的分類成效 34
4.1.2. 實驗二：XGBoost於暗網Layer-2上的分類成效 35
4.1.3. 實驗三：資料前處理對模型分類成效之影響 37
4.1.4. 實驗四：模型評估指標選擇 39
4.2. 情境二：PSO-XGB於Layer-1上的分類並與其他模型進行比較 41
4.2.1. 實驗五：Layer-1超參數優化之分類成效 41
4.2.2. 實驗六：Layer-1模型超參數優化方法比較 44
4.2.3. 實驗七：Layer-1超參數優化與特徵選擇之分類成效 46
4.2.4. 實驗八：Layer-1模型預測時間比較 51
4.3. 情境三：PSO-XGB於Layer-2上的分類並與其他模型進行比較 52
4.3.1. 實驗九：Layer-2超參數優化之分類成效 52
4.3.2. 實驗十：Layer-2模型超參數優化方法比較 54
4.3.3. 實驗十一：Layer-2超參數優化與特徵選擇之分類成效 56
4.3.4. 實驗十二：Layer-2模型預測時間比較 59
第五章結論與未來研究方向 60
5.1.1. 結論 60
5.1.2. 研究限制 60
5.1.3. 未來研究 61
參考文獻 63
附錄 70

參考文獻

[1] A. Nastuła, “Dilemmas related to the functioning and growth of Darknet and the Onion Router network,” 2020 Social Development and Security, vol. 10, no. 2, pp. 3-10, 2020.
[2] R. Figueiredo and K. Subratie, “Edgevpn.io: Open-source virtual private network for seamless edge computing with Kubernetes,” 2020 IEEE/ACM Symposium on Edge Computing (SEC 2020), pp. 190-192, 2020.
[3] CNBC, “Operation SpecTor” Accessed on May 16, 2023. [Online]. Available: https://www.cnbc.com/2023/05/02/operation-spector-288-arrests-made-in-international-drug-takedown.html
[4] TechTarget, “Intrusion detection system (IDS)”, Accessed on May 16, 2023. [Online].　Available: https://www.techtarget.com/searchsecurity/definition/ intrusion-detection-system
[5] IBM, “Machine learning” , Accessed on May 16, 2023. [Online].　Available: https://www.ibm.com/topics/machine-learning
[6] CISCO, “Cyberattack”, Accessed on May 16, 2023. [Online]. Available: https://www.cisco.com/c/en/us/products/security/common-cyberattacks.html
[7] Wikipedia, “Hyperparameter”, Accessed on May 18, 2023. [Online]. Available: https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)
[8] U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” 2022 Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 4, pp. 1060-1073, 2022.
[9] Tor Project, Inc. “Tor project | Anonymity Online”, Accessed on May 17, 2023. [Online]. Available: https://www.torproject.org/
[10] Investopedia, “Internet service provider (ISP)” , Accessed on May 17, 2023. [Online]. Available: https://www.investopedia.com/terms/i/isp.asp
[11] T. Wangchuk, D. Rathod, “Forensic and behavior analysis of free Android VPNs,” 2021 Journal of Applied Engineering, Technology and Management, vol. 1, no. 1, pp. 91-101, 2021.
[12] Surfshark, “Using Tor over a VPN”, Accessed on May 17, 2023. [Online]. Available: https://surfshark.com/blog/tor-over-vpn
[13] AT&T Business, “Hosted IDS: host-based intrusion detection system” , Accessed on May 17, 2023. [Online]. Available: https://cybersecurity.att.com/solutions/ host-intrusion-detection-system
[14] Netacea, “Network intrusion detection system (NIDS)” , Accessed on May 17, 2023. [Online]. Available: https://netacea.com/glossary/network-intrusion-detection-system-nids/
[15] H. K. Lim, J. B. Kim, J.S. Heo, K. Kim, Y. G. Hong, and Y. H. Han. “Packet-based network traffic classification using deep learning.” 2019 IEEE International Conference on Artificial Intelligence in Information and Communication (ICAIIC 2019), pp. 046-051. 2019
[16] M. Ring, D. Schlör, D. Landes, and A. Hotho, “Flow-based network traffic generation using generative adversarial networks,” 2019 Computers & Security, vol. 82, pp. 156-172, 2019.
[17] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” 2021 Artificial Intelligence Review, vol. 54, pp. 1937-1967, 2021.
[18] T. Chen and C. Guestrin, “Xgboost: a scalable tree boosting system,” 2016 The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.
[19] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” 2021 Artificial Intelligence Review, vol. 54, pp. 1937-1967, 2021
[20] Built In, “L1 and L2 regularization methods, explained”, Accessed on May 18, 2023. [Online]. Available: https://builtin.com/data-science/l2-regularization
[21] B. S. Bhati, G. Chugh, F. Al-Turjman, and N. S. Bhati, “An improved ensemble based intrusion detection technique using XGBoost,” 2021 Transactions on Emerging Telecommunications Technologies, vol. 32, no. 6, pp. 4076, 2021.
[22] W. Xu and Y. Fan, “Intrusion detection systems based on logarithmic autoencoder and XGBoost,” 2022 Security and Communication Networks, vol. 2022.
[23] T. M. Shami, A. A. El-Saleh, M. Alswaitti, Q. Al-Tashi, M. Amen Summakieh, and S. Mirjalili, “Particle swarm optimization: a comprehensive survey,” 2022 IEEE Access, vol. 10, pp. 10031-10061, 2022.
[24] J. Kennedy and R. Eberhart, “Particle swarm optimization,” 1995 International Conference on Neural Networks (ICNN 1995), vol. 4, pp. 1942-1948, 1995.
[25] M. Zöller and M. F. Huber, “Benchmark and survey of automated machine learning frameworks,” 2021 Journal of Artificial Intelligence Research, vol. 70, pp. 409-472, 2021.
[26] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Auto-sklearn 2.0: hands-free automl via meta-learning,” 2022 The Journal of Machine Learning Research, vol. 23, no. 1, pp. 11936-11996, 2022.
[27] H. Jin, F. Chollet, Q. Song, and X. Hu, “Autokeras: An automl library for deep learning,” 2023 Journal of Machine Learning Research, vol. 24, no. 6, pp. 1-6, 2023.
[28] F. R. Adaryani, S. J. Mousavi, and F. Jafari, “Short-term rainfall forecasting using machine learning-based approaches of PSO-SVR, LSTM and CNN,” 2022 Journal of Hydrology, vol. 614, p. 128463, 2022.
[29] X. Cai and R. J. Wai, “Intelligent DC arc-fault detection of solar PV power generation system via optimized VMD-based signal processing and PSO-SVM classifier,” 2022 IEEE Journal of Photovoltaics, vol. 12, no. 4, pp. 1058-1077, 2022.
[30] H. Das, B. Naik, and HS Behera, “A Jaya algorithm based wrapper method for optimal feature selection in supervised classification,” 2022 Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 6, pp. 3851-3863, 2022.
[31] S. M. Kasongo and Y. Sun, “Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset,” 2020 Journal of Big Data, vol. 7, pp. 1-20, 2020.
[32] F. A. Saputra, I. U. Nadhori, and B. F. Barry, “Detecting and blocking onion router traffic using deep packet inspection,” 2016 International Electronics Symposium (IES 2016), pp. 283-288, 2016.
[33] Wireshark Foundation, “Wireshark”, Accessed on May 18, 2023. [Online]. Available: https://www.wireshark.org/download.html
[34] A. Habibi Lashkari, G. Kaur, and A. Rahali, “Didarknet: A contemporary approach to detect and characterize the darknet traffic using deep image learning,” 2020 The 10th International Conference on Communication and Network Security (ICCNS 2020), pp. 1-13, 2020.
[35] B. Charbuty and A. Abdulazeez, “Classification based on decision tree algorithm for machine learning,” 2021 Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20-28, 2021.
[36] D. Sarkar, P. Vinod, and S. Y. Yerima, “Detection of Tor traffic using deep learning,” 2020 IEEE/ACS The 17th International Conference on Computer Systems and Applications (AICCSA 2020), pp. 1-8, 2020.
[37] M. Cherrington, F. Thabtah, J. Lu, and Q. Xu, “Feature selection: filter methods performance challenges,” 2019 International Conference on Computer and Information Sciences (ICCIS 2019), pp. 1-4, 2019.
[38] B. Singh, N. Kushwaha, O. P. Vyas, “A feature subset selection technique for high dimensional data using symmetric uncertainty,” 2014 Journal of Data Analysis and Information Processing (JDAIP 2014), vol. 2, no. 04, p. 95, 2014.
[39] Q. Abu Al-Haija, M. Krichen, and W. Abu Elhaija, “Machine-learning-based darknet traffic detection system for IoT applications,” 2022 Electronics, vol. 11, no. 4, p. 556, 2022.
[40] UNB, “CICFlowMeter” Accessed on May 20, 2023. [Online]. Available: https://www.unb.ca/cic/research/applications.html
[41] UNB, “CIC-Darknet2020” Accessed on May 22, 2023. [Online]. Available: https://www.unb.ca/cic/datasets/darknet2020.html
[42] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and VPN traffic using time-related,” 2016 The 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), pp. 407-414, 2016.
[43] A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, A. A. Ghorbani, et al., “Characterization of Tor traffic using time-based features,” 2017 The 3nd International Conference on Information Systems Security and Privacy (ICISSP 2017), pp. 253-262, 2017.
[44] The Tcpdump Group, “Tcpdump” Accessed on May 20, 2023. [Online]. Available: https://www.tcpdump.org/
[45] R. Chandra, K. Chaudhary, and A. Kumar, “Comparison of data normalization for wine classification using K-NN algorithm,” 2022 International Journal of Informatics and Information Systems, vol. 5, no. 4, pp. 175-180, 2022.
[46] Wikipedia, “Confusion matrix” Accessed on May 25, 2023. [Online]. Available: https://en.wikipedia.org/wiki/Confusion_matrix
[47] NumFOCUS Inc, “Pandas” Accessed on May 26 2023. [Online]. Available: https://pandas.pydata.org/
[48] NumPy, “NumPy” Accessed on May 26 2023. [Online]. Available: https://numpy.org/citing-numpy/
[49] Keras, “Keras” Accessed on May 26 2023. [Online]. Available: https://keras.io/
[50] NVIDIA Corporation, “CUDA Toolkit” Accessed on May 26 2023. [Online]. Available: https://developer.nvidia.com/cuda-toolkit
[51] NVIDIA Corporation, “NVIDIA cuDNN” Accessed on May 26 2023. [Online]. Available: https://developer.nvidia.com/cudnn
[52] Simon Blanke, “Hyperactive” Accessed on May 26 2023. [Online]. Available: https://github.com/SimonBlanke/Hyperactive#citing-hyperactive
[53] F. S. Nahm, “Receiver operating characteristic curve: overview and practical use for clinicians,” 2022 Korean Journal of Anesthesiology, vol. 75, no. 1, pp. 25-36, 2022.
[54] M. Hosseini Shirvani and A. Akbarifar, “A comparative study on anonymizing networks: TOR, I2P, and riffle networks comparison,” 2022 Journal of Electrical and Computer Engineering Innovations (JECEI), vol. 10, no. 2, pp. 259-272, 2022.
[55] Vultr.com, “ZeroNet” Accessed on June 5 2023. [Online]. Available: https://zeronet.io/
[56] E. Figueras-Martín, R. Magán-Carrión, and J. Boubeta-Puig, “Drawing the web structure and content analysis beyond the Tor darknet: Freenet as a case of study,” 2022 Journal of Information Security and Applications, vol. 68, p. 103229, 2022.
[57] R. Zuech, J. Hancock, and T. M. Khoshgoftaar, “Detecting web attacks using random undersampling and ensemble learners,”2021 Journal of Big Data, vol. 8, no. 1, pp. 1-20, 2021.
[58] N. Chakrabarty and S. Biswas, “Navo minority over-sampling technique (NMOTe): a consistent performance booster on imbalanced datasets,”2020 Journal of Electronics, vol. 2, no. 02, pp. 96-136, 2020.
[59] H. Hewamalage, C. Bergmeir, and K. Bandara, “Recurrent neural networks for time series forecasting: current status and future directions,”2021 International Journal of Forecasting, vol. 37, no. 1, pp. 388-427, 2021.
[60] M. Vishwakarma and N. Kesswani, “DIDS: A deep neural network based real-time intrusion detection system for IoT,”2022 Decision Analytics Journal, vol. 5, p. 100142, 2022.

指導教授

周立德(Li-Der Chou)

審核日期

2023-8-14

推文