本研究使用 Wireshark 蒐集了個案公司 P 的網路流量作為資料集,經過特徵選取 後,使用了半監督式學習演算法 Label Propagation Algorithm(LPA)、Label Spreading Algorithm(LSA)對標有少量標籤的訓練資料集進行 pseudo label 的標籤預測,然後將 帶有 pseudo label 的訓練資料集結合四種機器學習分類器:決策樹、隨機森林、SVM、 貝式分類器中進行建模,建模完成之後,再以標有正確標籤的測試資料集進行預測。 實驗結果表明,若選擇使用 LPA 演算法結合 SVM 分類器建模,則可以達到最好的分 類成效。;Over the past few decades. With the rapid of the Internet of Things(IoT) and artificial intelligence(AI). Human dependence on the network is more and more common and bring the cybersecurity threats. Therefore, network traffic classification has become a crucial issue in network security. For enterprise, it’s important to understand the flow generated by various applications on the network. Through further analysis and research, enterprise can gain a more understanding of the network flow, sources, and destinations within the entire company.
In this paper, we collected data from private enterprises to create a proprietary dataset. the dataset was processed using the algorithm of Label Propagation(LPA)and Label Spreading (LSA)to build model after feature selection. And then we use model to predict the small amount labeled dataset and add pseudo label to this dataset. And then we use classifier such as Decision Tree、Random Forest、Support Vector Machine(SVM)、Naïve Bayes to train the dataset which include pseudo label and build model. Finally, we use this model to predict test dataset. The experimental results demonstrate that when combining the LPA with SVM classifier, it is possible to achieve an optimal effectiveness.