基於半監督式學習的網路流量分類;Network traffic classification via semi-supervised learning

NCUIR > School of Management at National Central University > Executive Master of Information Management > Electronic Thesis & Dissertation > Item 987654321/93180

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/93180

Title:	基於半監督式學習的網路流量分類;Network traffic classification via semi-supervised learning
Authors:	薛德豪;Hsueh, Te-Hao
Contributors:	資訊管理學系在職專班
Keywords:	網路流量分析;機器學習;半監督式學習;資料探勘;Wireshark;Label Propagation;Label Spreading
Date:	2023-07-15
Issue Date:	2024-09-19 16:46:01 (UTC+8)
Publisher:	國立中央大學
Abstract:	過去十幾年來，隨著物聯網與人工智慧的興起，人類對於網路的依賴程度也越來越高，而網路的普及同時也帶來了網路安全的隱憂，因此網路流量分類成為了一個很重要的網路安全議題。對於企業而言了解網路中各種應用程式所產生的流量是非常重要的事情。透過進一步的分析與研究，企業可以更準確的掌握到整個公司的網路流向、來源與目的。本研究使用 Wireshark 蒐集了個案公司 P 的網路流量作為資料集，經過特徵選取後，使用了半監督式學習演算法 Label Propagation Algorithm（LPA）、Label Spreading Algorithm（LSA）對標有少量標籤的訓練資料集進行 pseudo label 的標籤預測，然後將帶有 pseudo label 的訓練資料集結合四種機器學習分類器：決策樹、隨機森林、SVM、貝式分類器中進行建模，建模完成之後，再以標有正確標籤的測試資料集進行預測。實驗結果表明，若選擇使用 LPA 演算法結合 SVM 分類器建模，則可以達到最好的分類成效。;Over the past few decades. With the rapid of the Internet of Things（IoT） and artificial intelligence（AI）. Human dependence on the network is more and more common and bring the cybersecurity threats. Therefore, network traffic classification has become a crucial issue in network security. For enterprise, it’s important to understand the flow generated by various applications on the network. Through further analysis and research, enterprise can gain a more understanding of the network flow, sources, and destinations within the entire company. In this paper, we collected data from private enterprises to create a proprietary dataset. the dataset was processed using the algorithm of Label Propagation（LPA）and Label Spreading （LSA）to build model after feature selection. And then we use model to predict the small amount labeled dataset and add pseudo label to this dataset. And then we use classifier such as Decision Tree、Random Forest、Support Vector Machine（SVM）、Naïve Bayes to train the dataset which include pseudo label and build model. Finally, we use this model to predict test dataset. The experimental results demonstrate that when combining the LPA with SVM classifier, it is possible to achieve an optimal effectiveness.
Appears in Collections:	[Executive Master of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	38	View/Open

社群 sharing

Loading...