dc.description.abstract | Abstract
According to the data on fraud cases and financial losses provided by the Criminal Investigation Bureau, investment fraud has the highest occurrence and financial loss, followed by installment payment fraud, which ranks second. It is evident that these types of fraud are prevalent in online scams. Installment payment fraud is closely related to the leakage of personal data. Hackers exploit vulnerabilities in the order systems of shopping websites and auction platforms to steal buyers′ personal information and transaction details. They then impersonate customer service representatives to carry out ATM installment payment fraud. Therefore, the leakage of personal information leads to frequent occurrences of fraudulent activities. To mitigate installment payment fraud, a proactive approach is to identify the attack connections made by hackers and effectively analyze firewall logs to identify suspicious and abnormal connections. This study aims to analyze firewall system log records to detect anomalous intrusion connections and provide warnings for network anomalies, thereby preventing potential security breaches.
The research methodology employed in this study involves the use of the K-means clustering method to determine the optimal number of clusters. The data is then divided into clusters, and the average values of each cluster are calculated, considering them as the cluster centroids. The distances between each data point and the centroids are then calculated to detect abnormal connections. To obtain accurate identification of abnormal connections using the clustering approach, three distance calculation methods are compared: Euclidean Distance, Manhattan Distance, and Chebyshev Distance. Additionally, a deep neural network (DNN) is utilized to model the complex data from the firewall. The test dataset is used to evaluate the performance of the model, utilizing metrics such as confusion matrix, ROC curve, accuracy, and recall.
Case analysis demonstrates the practical application of the research methodology using real-world data. The experiments verify that outliers can indeed be used to detect abnormal network connections, with the Manhattan Distance yielding more accurate results for analyzing firewall log data. Furthermore, the DNN model outperforms the stacked classification prediction model in detecting abnormal connections.
Keywords: K-means, Euclidean Distance, Manhattan Distance, Chebyshev Distance, deep learning, outliers | en_US |