姓名 李嘉峻(CHIA-CHUN Li)  查詢紙本館藏   畢業系所 資訊管理學系在職專班
論文名稱 結合分群法與深度學習之網路行為異常偵測
(A Study on Anomaly Detection Based on Clustering and Deep Learning)
摘要(中) 摘要
研究方法是使用K-means分群法,取出最佳的K值,再將數據分成cluster,並計算該cluster分群的平均值,且將該平均值視為cluster的中心點。再利用每個數據點與中心點的距離來檢測異常連線。為求在分群法中取得正確的異常連線,本文計算距離的方法,分別使用歐基里得距離,簡稱歐氏距離(Euclidean Distance)、曼哈頓距離(Manhattan Distance)、切比雪夫距離(Chebyshev Distance)3種進行比較。另以深度神經網絡(DNN)對防火牆的複雜數據進行建模,將測試數據集提供給模型進行評估,並使用混淆矩陣、ROC曲線和精度、召回率等指標來評估模型性能。
摘要(英) Abstract
According to the data on fraud cases and financial losses provided by the Criminal Investigation Bureau, investment fraud has the highest occurrence and financial loss, followed by installment payment fraud, which ranks second. It is evident that these types of fraud are prevalent in online scams. Installment payment fraud is closely related to the leakage of personal data. Hackers exploit vulnerabilities in the order systems of shopping websites and auction platforms to steal buyers′ personal information and transaction details. They then impersonate customer service representatives to carry out ATM installment payment fraud. Therefore, the leakage of personal information leads to frequent occurrences of fraudulent activities. To mitigate installment payment fraud, a proactive approach is to identify the attack connections made by hackers and effectively analyze firewall logs to identify suspicious and abnormal connections. This study aims to analyze firewall system log records to detect anomalous intrusion connections and provide warnings for network anomalies, thereby preventing potential security breaches.
The research methodology employed in this study involves the use of the K-means clustering method to determine the optimal number of clusters. The data is then divided into clusters, and the average values of each cluster are calculated, considering them as the cluster centroids. The distances between each data point and the centroids are then calculated to detect abnormal connections. To obtain accurate identification of abnormal connections using the clustering approach, three distance calculation methods are compared: Euclidean Distance, Manhattan Distance, and Chebyshev Distance. Additionally, a deep neural network (DNN) is utilized to model the complex data from the firewall. The test dataset is used to evaluate the performance of the model, utilizing metrics such as confusion matrix, ROC curve, accuracy, and recall.
Case analysis demonstrates the practical application of the research methodology using real-world data. The experiments verify that outliers can indeed be used to detect abnormal network connections, with the Manhattan Distance yielding more accurate results for analyzing firewall log data. Furthermore, the DNN model outperforms the stacked classification prediction model in detecting abnormal connections.
Keywords: K-means, Euclidean Distance, Manhattan Distance, Chebyshev Distance, deep learning, outliers
關鍵字(中) ★ K-means分群
★ 歐基里得距離
★ 曼哈頓距離
★ 切比雪夫距離
★ 深度學習
★ 離群值
關鍵字(英) ★ K-means
★ Euclidean Distance
★ Manhattan Distance
★ Chebyshev Distance
★ deep learning
★ outliers
論文目次 圖目錄 I
表目錄 III
第一章 緒論 1
1.1研究背景 1
1.2研究動機 3
1.3研究方法 6
1.4研究目的 8
第二章 文獻探討 10
2.1防火牆Log分析方法 10
2.2 基於異常連線之Kmeans分群 11
2.3 本研究使用之異常偵測模型論述 12
第三章 研究方法 19
3.1特徵選取 21
3.2 資料前處理 23
3.3 K-means的分群 26
3.4 建構Cluster-DNN模型 30
3.5分析可疑連線 31
3.6設定蒐尋索引值 36
第四章 實證結果分析 38
4.1資料收集與預處理 38
4.2資料集特徵選取 44
4.3實測特徵選取與K-means成效評估 49
4.4 異常連線 58
4.5 數據模型比較 61
4.6 Internet Firewall Data Set模型效能比較 67
4.7案例分析 74
第五章:結論與展望 81
5.1 研究成果總結 81
5.2研究貢獻 82
5.3 未來研究方向建議 82
Reference 83
參考文獻 Reference
指導教授 陳以錚 審核日期 2023-7-5
