結合分群法與深度學習之網路行為異常偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：41

、訪客IP：3.137.170.50

姓名

李嘉峻(CHIA-CHUN Li) 查詢紙本館藏

畢業系所

資訊管理學系在職專班

論文名稱

結合分群法與深度學習之網路行為異常偵測
(A Study on Anomaly Detection Based on Clustering and Deep Learning)

相關論文

★ 台灣50走勢分析：以多重長短期記憶模型架構為基礎之預測	★ 以多重遞迴歸神經網路模型為基礎之黃金價格預測分析
★ 增量學習用於工業4.0瑕疵檢測	★ 遞回歸神經網路於電腦零組件銷售價格預測之研究
★ 長短期記憶神經網路於釣魚網站預測之研究	★ 基於深度學習辨識跳頻信號之研究
★ Opinion Leader Discovery in Dynamic Social Networks	★ 深度學習模型於工業4.0之機台虛擬量測應用
★ A Novel NMF-Based Movie Recommendation with Time Decay	★ 以類別為基礎sequence-to-sequence模型之POI旅遊行程推薦
★ A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search	★ Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning
★ 生成式對抗網路架構搜尋	★ 以漸進式基因演算法實現神經網路架構搜尋最佳化
★ Enhanced Model Agnostic Meta Learning with Meta Gradient Memory	★ 遞迴類神經網路結合先期工業廢水指標之股價預測研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-1以後開放)

摘要(中)

摘要
依警政署詐欺犯罪案件發生及財損數據顯示，假投資詐欺發生數及財損最高，其次就是解除分期付款詐騙案件，高居第二名，顯然是網路詐騙之高發案件。解除分期付款詐騙其犯罪成因，與個人資料外洩有絕對關係，駭客利用網路入侵購物網站與拍賣平臺賣家的訂單系統，竊取買家個人資料與交易明細資料，再假冒客服進行ATM解除分期付款詐騙，所以個資外洩導致詐欺犯罪事件頻頻發生。為扼止解除分期付款詐騙，正本清源的方法即是查出駭客之攻擊連線，如何有效分析防火牆日誌，從中找出可疑的異常連線。本研究希望透過分析防火牆系統日誌紀錄查出入侵的異常連線來源，並針對網路異常連線提出警示，防制遭入侵攻擊之可能。
研究方法是使用K-means分群法，取出最佳的K值，再將數據分成cluster，並計算該cluster分群的平均值，且將該平均值視為cluster的中心點。再利用每個數據點與中心點的距離來檢測異常連線。為求在分群法中取得正確的異常連線，本文計算距離的方法，分別使用歐基里得距離，簡稱歐氏距離（Euclidean Distance）、曼哈頓距離（Manhattan Distance）、切比雪夫距離（Chebyshev Distance）3種進行比較。另以深度神經網絡（DNN）對防火牆的複雜數據進行建模，將測試數據集提供給模型進行評估，並使用混淆矩陣、ROC曲線和精度、召回率等指標來評估模型性能。
案例分析：將研究方法實際使用於案例數據，實驗證明，離群值確實可以用來偵測網路異常連線，且防火牆日誌數據使用曼哈頓距離計算較為準確，另DNN模型與堆疊分類預測比較，DNN模型更能檢測出異常連線。

摘要(英)

Abstract
According to the data on fraud cases and financial losses provided by the Criminal Investigation Bureau, investment fraud has the highest occurrence and financial loss, followed by installment payment fraud, which ranks second. It is evident that these types of fraud are prevalent in online scams. Installment payment fraud is closely related to the leakage of personal data. Hackers exploit vulnerabilities in the order systems of shopping websites and auction platforms to steal buyers′ personal information and transaction details. They then impersonate customer service representatives to carry out ATM installment payment fraud. Therefore, the leakage of personal information leads to frequent occurrences of fraudulent activities. To mitigate installment payment fraud, a proactive approach is to identify the attack connections made by hackers and effectively analyze firewall logs to identify suspicious and abnormal connections. This study aims to analyze firewall system log records to detect anomalous intrusion connections and provide warnings for network anomalies, thereby preventing potential security breaches.
The research methodology employed in this study involves the use of the K-means clustering method to determine the optimal number of clusters. The data is then divided into clusters, and the average values of each cluster are calculated, considering them as the cluster centroids. The distances between each data point and the centroids are then calculated to detect abnormal connections. To obtain accurate identification of abnormal connections using the clustering approach, three distance calculation methods are compared: Euclidean Distance, Manhattan Distance, and Chebyshev Distance. Additionally, a deep neural network (DNN) is utilized to model the complex data from the firewall. The test dataset is used to evaluate the performance of the model, utilizing metrics such as confusion matrix, ROC curve, accuracy, and recall.
Case analysis demonstrates the practical application of the research methodology using real-world data. The experiments verify that outliers can indeed be used to detect abnormal network connections, with the Manhattan Distance yielding more accurate results for analyzing firewall log data. Furthermore, the DNN model outperforms the stacked classification prediction model in detecting abnormal connections.
Keywords: K-means, Euclidean Distance, Manhattan Distance, Chebyshev Distance, deep learning, outliers

關鍵字(中)

★ K-means分群
★ 歐基里得距離
★ 曼哈頓距離
★ 切比雪夫距離
★ 深度學習
★ 離群值

關鍵字(英)

★ K-means
★ Euclidean Distance
★ Manhattan Distance
★ Chebyshev Distance
★ deep learning
★ outliers

論文目次

圖目錄 I
表目錄 III
第一章緒論 1
1.1研究背景 1
1.2研究動機 3
1.3研究方法 6
1.4研究目的 8
第二章文獻探討 10
2.1防火牆Log分析方法 10
2.2 基於異常連線之Kmeans分群 11
2.3 本研究使用之異常偵測模型論述 12
第三章研究方法 19
3.1特徵選取 21
3.2 資料前處理 23
3.3 K-means的分群 26
3.4 建構Cluster-DNN模型 30
3.5分析可疑連線 31
3.6設定蒐尋索引值 36
第四章實證結果分析 38
4.1資料收集與預處理 38
4.2資料集特徵選取 44
4.3實測特徵選取與K-means成效評估 49
4.4 異常連線 58
4.5 數據模型比較 61
4.6 Internet Firewall Data Set模型效能比較 67
4.7案例分析 74
第五章：結論與展望 81
5.1 研究成果總結 81
5.2研究貢獻 82
5.3 未來研究方向建議 82
Reference 83

參考文獻

Reference
[1] 自由時報電子報, “中央資安預算無特別增加立院國民黨團批蔡政府不重視 - 政治,” 自由時報電子報, Dec. 07, 2022. https://news.ltn.com.tw/news/politics/breakingnews/4147538 (accessed Mar. 19, 2023).
[2] 聯合新聞網, “台灣已成駭客天堂,” 台灣已成駭客天堂. https://topic.udn.com/event/newmedia_hacker_taiwan (accessed May 08, 2023).
[3] “解除分期竄第一內政部公布109年詐騙手法排行榜,” 內政部全球資訊網-中文網, Feb. 02, 2021. http://www.moi.gov.tw/News_Content.aspx?n=4&s=212607 (accessed Jan. 06, 2023).
[4] “適用於入侵偵測之高準確度階層式分群演算法 - 政大學術集成.” Accessed: Apr. 10, 2023. [Online]. Available: https://ah.nccu.edu.tw/item?item_id=130764
[5] “數據科學中常見9種距離度量方法，內含歐氏距離、切比雪夫距離等.” https://min.news/technique/ad5eb4d698294c94d2e096c24bb682ec.html (accessed May 05, 2023).
Log File Analysis
[6] R. R. Abdalla and A. K. Jumaa, “Log File Analysis Based on Machine Learning: A Survey: Survey,” UHD J SCI TECH, vol. 6, no. 2, pp. 77–84, Oct. 2022, doi: 10.21928/uhdjst.v6n2y2022.pp77-84.
[7] H. Saleous and Z. Trabelsi, “Enhancing Firewall Filter Performance Using Neural Networks,” in 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco: IEEE, Jun. 2019, pp. 1853–1859. doi: 10.1109/IWCMC.2019.8766576.
[8] K. Neupane, R. Haddad, and L. Chen, “Next Generation Firewall for Network Security: A Survey,” in SoutheastCon 2018, Apr. 2018, pp. 1–6. doi: 10.1109/SECON.2018.8478973.
[9] M. Landauer, F. Skopik, M. Wurzenberger, and A. Rauber, “System log clustering approaches for cyber security applications: A survey,” Computers & Security, vol. 92, p. 101739, May 2020, doi: 10.1016/j.cose.2020.101739.
[10] L. Han, “Research of K-MEANS Algorithm Based on Information Entropy in Anomaly Detection,” in 2012 Fourth International Conference on Multimedia Information Networking and Security, Nanjing, China: IEEE, Nov. 2012, pp. 71–74. doi: 10.1109/MINES.2012.169.
[11] R. Kumari, Sheetanshu, M. K. Singh, R. Jha, and N. K. Singh, “Anomaly detection in network traffic using K-mean clustering,” in 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India: IEEE, Mar. 2016, pp. 387–393. doi: 10.1109/RAIT.2016.7507933.
[12] M. F. Lima, B. B. Zarpelao, L. D. H. Sampaio, and M. L. P. Jr, “Anomaly detection using baseline and K-means clustering”.
[13] Y. Y. Aung and M. M. Min, “An Analysis of K-means Algorithm Based Network Intrusion Detection System,” Adv. sci. technol. eng. syst. j., vol. 3, no. 1, pp. 496–501, Feb. 2018, doi: 10.25046/aj030160.
Related Model Discussion
[14] W. Lu and I. Traore, “Unsupervised anomaly detection using an evolutionary extension of k-means algorithm,” IJICS, vol. 2, no. 2, p. 107, 2008, doi: 10.1504/IJICS.2008.018513.
[15] Y. C. Chen, “如何辨別機器學習模型的好壞？秒懂Confusion Matrix,” YC Note, 12:00:00+08:00. https://ycc.idv.tw/confusion-matrix.html (accessed Jan. 11, 2023).
[16] H. E. As-Suhbani and S. D. Khamitkar, “Classification of Firewall Logs Using Supervised Machine Learning Algorithms,” ijcse, vol. 7, no. 8, pp. 301–304, Aug. 2019, doi: 10.26438/ijcse/v7i8.301304.
[17] E. Ucar and E. Ozhan, “The Analysis of Firewall Policy Through Machine Learning and Data Mining,” Wireless Pers Commun, vol. 96, no. 2, pp. 2891–2909, Sep. 2017, doi: 10.1007/s11277-017-4330-0.
[18] B. A. AL-Tarawneh and H. Bani-Salameh, “Classification of Firewall Logs Actions Using Machine Learning Techniques and Deep Neural Network”.
[19] V. K. Navya, J. Adithi, D. Rudrawal, H. Tailor, and N. James, “Intrusion Detection System using Deep Neural Networks (DNN),” in 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Oct. 2021, pp. 1–6. doi: 10.1109/ICAECA52838.2021.9675513.
[20] C. Lillmond and G. Suddul, “A Deep Neural Network Approach for Analysis of Firewall Log Data,” Sri Lanka, 2021.
[21] M. Aljabri, A. A. Alahmadi, R. M. A. Mohammad, M. Aboulnour, D. M. Alomari, and S. H. Almotiri, “Classification of Firewall Log Data Using Multiclass Machine Learning Models,” Electronics, vol. 11, no. 12, p. 1851, Jun. 2022, doi: 10.3390/electronics11121851.
[22] O. Faker and E. Dogdu, “Intrusion Detection Using Big Data and Deep Learning Techniques,” in Proceedings of the 2019 ACM Southeast Conference, in ACM SE ’19. New York, NY, USA: Association for Computing Machinery, Apr. 2019, pp. 86–93. doi: 10.1145/3299815.3314439.
[23] M. Ugurlu and I. A. Dogru, “A Survey on Deep Learning Based Intrusion Detection System,” in 2019 4th International Conference on Computer Science and Engineering (UBMK), Samsun, Turkey: IEEE, Sep. 2019, pp. 223–228. doi: 10.1109/UBMK.2019.8907206.
[24] S. P. Thirimanne, L. Jayawardana, L. Yasakethu, P. Liyanaarachchi, and C. Hewage, “Deep Neural Network Based Real-Time Intrusion Detection System,” SN COMPUT. SCI., vol. 3, no. 2, p. 145, Jan. 2022, doi: 10.1007/s42979-022-01031-1.
[25] M. Al-Qatf, Y. Lasheng, M. Al-Habib, and K. Al-Sabahi, “Deep Learning Approach Combining Sparse Autoencoder With SVM for Network Intrusion Detection,” IEEE Access, vol. 6, pp. 52843–52856, 2018, doi: 10.1109/ACCESS.2018.2869577.
[26] T. Schindler, “Anomaly Detection in Log Data using Graph Databases and Machine Learning to Defend Advanced Persistent Threats,” 2017, doi: 10.18420/in2017_241.
[27] Xiao L.-Z., “An Algorithm for Automatic Clustering Number Determination in Networks Intrusion Detection: An Algorithm for Automatic Clustering Number Determination in Networks Intrusion Detection,” Journal of Software, vol. 19, no. 8, pp. 2140–2148, Oct. 2008, doi: 10.3724/SP.J.1001.2008.02140.
[28] D. Sharma, V. Wason, and P. Johri, “Optimized Classification of Firewall Log Data using Heterogeneous Ensemble Techniques,” in 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Mar. 2021, pp. 368–372. doi: 10.1109/ICACITE51222.2021.9404732.
[29] F. Ertam and M. Kaya, “Classification of firewall log files with multiclass support vector machine,” in 2018 6th International Symposium on Digital Forensic and Security (ISDFS), Antalya: IEEE, Mar. 2018, pp. 1–4. doi: 10.1109/ISDFS.2018.8355382.
[30] Shatt Alarab University College and H. AL-Behadili, “Decision Tree for Multiclass Classification of Firewall Access,” IJIES, vol. 14, no. 3, pp. 294–302, Jun. 2021, doi: 10.22266/ijies2021.0630.25.
[31] Yanxin Wang, Johnny Wong, and A. Miner, “Anomaly intrusion detection using one class SVM,” in Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004., West Point, NY, USA: IEEE, 2004, pp. 358–364. doi: 10.1109/IAW.2004.1437839.
[32] A. Farzad and T. A. Gulliver, “Unsupervised log message anomaly detection,” ICT Express, vol. 6, no. 3, pp. 229–237, Sep. 2020, doi: 10.1016/j.icte.2020.06.003.
[33] Z. Chen, J. Liu, W. Gu, Y. Su, and M. R. Lyu, “Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection.” arXiv, Jan. 10, 2022. Accessed: Dec. 17, 2022. [Online]. Available: http://arxiv.org/abs/2107.05908
[34] M. Du, F. Li, G. Zheng, and V. Srikumar, “DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas Texas USA: ACM, Oct. 2017, pp. 1285–1298. doi: 10.1145/3133956.3134015.
[35] O. Alghushairy, R. Alsini, T. Soule, and X. Ma, “A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams,” BDCC, vol. 5, no. 1, p. 1, Dec. 2020, doi: 10.3390/bdcc5010001.
[36] K. N. A. Prethi and S. Nithya, “DNN BASED INTELLIGENT IDS FOR ANOMALY DETECTION”.
[37] 蔡秉任,“針對未知攻擊辨識之混合式入侵偵測系統”,國立交通大學資訊科學與工程研究所, 2014
[38] 黃蕙嫈, “應用機器學習技術於入侵偵測系統分類之研究”,國防大學理工學院資訊管理工程學系, 2019
[39] 莊雅淳,“入侵偵測演算法效益之評估” , 中國文化大學商學院資訊管理管理研究所,2010. [Online]. Available: http://ir.lib.pccu.edu.tw/retrieve/48372/%E8%8E%8A%E9%9B%85%E6%B7%B3- 2.pdf
[40] Y. Yang, K. Zheng, C. Wu, X. Niu, and Y. Yang, “Building an Effective Intrusion Detection System Using the Modified Density Peak Clustering Algorithm and Deep Belief Networks,” Applied Sciences, vol. 9, no. 2, Art. no. 2, Jan. 2019, doi: 10.3390/app9020238.
[41] E. Tung, “SMOTE + ENN : 解決數據不平衡建模的採樣方法,” 數學、人工智慧與蟒蛇, Oct. 27, 2019. https://medium.com/%E6%95%B8%E5%AD%B8-%E4%BA%BA%E5%B7%A5%E6%99%BA%E6%85%A7%E8%88%87%E8%9F%92%E8%9B%87/smote-enn-%E8%A7%A3%E6%B1%BA%E6%95%B8%E6%93%9A%E4%B8%8D%E5%B9%B3%E8%A1%A1%E5%BB%BA%E6%A8%A1%E7%9A%84%E6%8E%A1%E6%A8%A3%E6%96%B9%E6%B3%95-cdb6324b711e (accessed Apr. 14, 2023).
[42] 解滨, X. I. E. Bin, 董新玉, D. Xinyu, 梁皓伟, and L. Haowei, “基于三支动态阈值K-means聚类的入侵检测算法,” Jun. 15, 2020. http://www.xml-data.org/ZZDXXBLXB/html/9003f91b-a6b3-48bd-a122-5dc0d5aeab0b.htm (accessed Dec. 31, 2022).
[43] S. Kumar, “Silhouette Method — Better than Elbow Method to find Optimal Clusters,” Medium, Sep. 21, 2021. https://towardsdatascience.com/silhouette-method-better-than-elbow-method-to-find-optimal-clusters-378d62ff6891 (accessed Dec. 17, 2022).
Example dataset
[44] “UCI Machine Learning Repository: Internet Firewall Data Data Set.” https://archive.ics.uci.edu/ml/datasets/Internet+Firewall+Data# (accessed Mar. 18, 2023).
[45] “臺灣高等法院 106 年度上訴字第 593 號刑事判決.” https://judgment.judicial.gov.tw/FJUD/data.aspx?ty=JD&id=TPHM,106%2c%e4%b8%8a%e8%a8%b4%2c593%2c20170518%2c1 (accessed Jun. 14, 2023).

指導教授

陳以錚

審核日期

2023-7-5

推文