摘要: | 摘要 依警政署詐欺犯罪案件發生及財損數據顯示,假投資詐欺發生數及財損最高,其次就是解除分期付款詐騙案件,高居第二名,顯然是網路詐騙之高發案件。解除分期付款詐騙其犯罪成因,與個人資料外洩有絕對關係,駭客利用網路入侵購物網站與拍賣平臺賣家的訂單系統,竊取買家個人資料與交易明細資料,再假冒客服進行ATM解除分期付款詐騙,所以個資外洩導致詐欺犯罪事件頻頻發生。為扼止解除分期付款詐騙,正本清源的方法即是查出駭客之攻擊連線,如何有效分析防火牆日誌,從中找出可疑的異常連線。本研究希望透過分析防火牆系統日誌紀錄查出入侵的異常連線來源,並針對網路異常連線提出警示,防制遭入侵攻擊之可能。 研究方法是使用K-means分群法,取出最佳的K值,再將數據分成cluster,並計算該cluster分群的平均值,且將該平均值視為cluster的中心點。再利用每個數據點與中心點的距離來檢測異常連線。為求在分群法中取得正確的異常連線,本文計算距離的方法,分別使用歐基里得距離,簡稱歐氏距離(Euclidean Distance)、曼哈頓距離(Manhattan Distance)、切比雪夫距離(Chebyshev Distance)3種進行比較。另以深度神經網絡(DNN)對防火牆的複雜數據進行建模,將測試數據集提供給模型進行評估,並使用混淆矩陣、ROC曲線和精度、召回率等指標來評估模型性能。 案例分析:將研究方法實際使用於案例數據,實驗證明,離群值確實可以用來偵測網路異常連線,且防火牆日誌數據使用曼哈頓距離計算較為準確,另DNN模型與堆疊分類預測比較,DNN模型更能檢測出異常連線。;Abstract According to the data on fraud cases and financial losses provided by the Criminal Investigation Bureau, investment fraud has the highest occurrence and financial loss, followed by installment payment fraud, which ranks second. It is evident that these types of fraud are prevalent in online scams. Installment payment fraud is closely related to the leakage of personal data. Hackers exploit vulnerabilities in the order systems of shopping websites and auction platforms to steal buyers′ personal information and transaction details. They then impersonate customer service representatives to carry out ATM installment payment fraud. Therefore, the leakage of personal information leads to frequent occurrences of fraudulent activities. To mitigate installment payment fraud, a proactive approach is to identify the attack connections made by hackers and effectively analyze firewall logs to identify suspicious and abnormal connections. This study aims to analyze firewall system log records to detect anomalous intrusion connections and provide warnings for network anomalies, thereby preventing potential security breaches. The research methodology employed in this study involves the use of the K-means clustering method to determine the optimal number of clusters. The data is then divided into clusters, and the average values of each cluster are calculated, considering them as the cluster centroids. The distances between each data point and the centroids are then calculated to detect abnormal connections. To obtain accurate identification of abnormal connections using the clustering approach, three distance calculation methods are compared: Euclidean Distance, Manhattan Distance, and Chebyshev Distance. Additionally, a deep neural network (DNN) is utilized to model the complex data from the firewall. The test dataset is used to evaluate the performance of the model, utilizing metrics such as confusion matrix, ROC curve, accuracy, and recall. Case analysis demonstrates the practical application of the research methodology using real-world data. The experiments verify that outliers can indeed be used to detect abnormal network connections, with the Manhattan Distance yielding more accurate results for analyzing firewall log data. Furthermore, the DNN model outperforms the stacked classification prediction model in detecting abnormal connections. Keywords: K-means, Euclidean Distance, Manhattan Distance, Chebyshev Distance, deep learning, outliers |