基於半監督式學習的網路流量分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：3

、訪客IP：3.133.141.6

姓名

薛德豪(Te-Hao Hsueh) 查詢紙本館藏

畢業系所

資訊管理學系在職專班

論文名稱

基於半監督式學習的網路流量分類
(Network traffic classification via semi-supervised learning)

相關論文

★ 多重標籤文本分類之實證研究 : word embedding 與傳統技術之比較	★ 基於圖神經網路之網路協定關聯分析
★ 學習模態間及模態內之共用表示式	★ Hierarchical Classification and Regression with Feature Selection
★ 病徵應用於病患自撰日誌之情緒分析	★ 基於注意力機制的開放式對話系統
★ 針對特定領域任務—基於常識的BERT模型之應用	★ 基於社群媒體使用者之硬體設備差異分析文本情緒強烈程度
★ 機器學習與特徵工程用於虛擬貨幣異常交易監控之成效討論	★ 捷運轉轍器應用長短期記憶網路與機器學習實現最佳維保時間提醒
★ ERP日誌分析-以A公司為例	★ 企業資訊安全防護：網路封包蒐集分析與網路行為之探索性研究
★ 資料探勘技術在顧客關係管理之應用─以C銀行數位存款為例	★ 人臉圖片生成與增益之可用性與效率探討分析
★ 人工合成文本之資料增益於不平衡文字分類問題	★ 探討使用多面向方法在文字不平衡資料集之分類問題影響

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-8-1以後開放)

摘要(中)

過去十幾年來，隨著物聯網與人工智慧的興起，人類對於網路的依賴程度也越來
越高，而網路的普及同時也帶來了網路安全的隱憂，因此網路流量分類成為了一個很
重要的網路安全議題。對於企業而言了解網路中各種應用程式所產生的流量是非常重
要的事情。透過進一步的分析與研究，企業可以更準確的掌握到整個公司的網路流
向、來源與目的。

本研究使用 Wireshark 蒐集了個案公司 P 的網路流量作為資料集，經過特徵選取
後，使用了半監督式學習演算法 Label Propagation Algorithm（LPA）、Label Spreading
Algorithm（LSA）對標有少量標籤的訓練資料集進行 pseudo label 的標籤預測，然後將
帶有 pseudo label 的訓練資料集結合四種機器學習分類器：決策樹、隨機森林、SVM、
貝式分類器中進行建模，建模完成之後，再以標有正確標籤的測試資料集進行預測。
實驗結果表明，若選擇使用 LPA 演算法結合 SVM 分類器建模，則可以達到最好的分
類成效。

摘要(英)

Over the past few decades. With the rapid of the Internet of Things（IoT） and artificial
intelligence（AI）. Human dependence on the network is more and more common and bring
the cybersecurity threats. Therefore, network traffic classification has become a crucial issue
in network security. For enterprise, it’s important to understand the flow generated by various
applications on the network. Through further analysis and research, enterprise can gain a more
understanding of the network flow, sources, and destinations within the entire company.

In this paper, we collected data from private enterprises to create a proprietary dataset. the
dataset was processed using the algorithm of Label Propagation（LPA）and Label Spreading
（LSA）to build model after feature selection. And then we use model to predict the small
amount labeled dataset and add pseudo label to this dataset. And then we use classifier such as
Decision Tree、Random Forest、Support Vector Machine（SVM）、Naïve Bayes to train
the dataset which include pseudo label and build model. Finally, we use this model to predict
test dataset. The experimental results demonstrate that when combining the LPA with SVM
classifier, it is possible to achieve an optimal effectiveness.

關鍵字(中)

★ 網路流量分析
★ 機器學習
★ 半監督式學習
★ 資料探勘

關鍵字(英)

★ Wireshark
★ Label Propagation
★ Label Spreading

論文目次

摘要 i
ABSTRACT ii
目錄 iv
表目錄 vi
圖目錄 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 論文架構 4
第二章文獻探討 5
2.1 機器學習技術應用於網路流量分類之相關研究 5
2.1.1 網路流量異常偵測分析-以 TWAREN 為例 5
2.1.2 適用於網路入侵偵測不平衡資料之階層式多重分類器 6
2.1.3 針對未知攻擊辨識之混合式入侵偵測系統 8
2.2 網路流量工具 8
2.3 機器學習分類技術 10
2.3.1 半監督式學習 10
2.3.2 監督式學習 11
第三章研究方法 15
3.1 研究架構 15
3.2 資料蒐集 16
3.3 資料前處理 17
3.3.1 Pcap 封包檔轉成 csv 格式 17
3.3.2 欄位名稱與說明 18
3.4 實驗設計 21
3.4.1 定義標籤 21
3.4.2 標記標籤 23
3.4.3 模型開發環境 24
3.4.4 標準化與特徵資料類型 24
3.4.5 模型訓練 25
3.5 評估指標 27
第四章研究結果與討論 30
4.1 比較 LPA 與 LSA 之實驗結果分析與討論 30
4.2 LPA 與 LSA 結合監督式分類器之實驗結果分析與討論 33
4.2.1 LPA 結合監督式學習分類器 33
4.2.2 LSA 結合監督式學習分類器 35
4.2.3 結果討論 37
第五章結論 38
5.1 結論 38
5.2 研究貢獻 38
5.2.1 私有資料集的建立結合半監督式學習研究 39
5.2.2 半監督式學習結合監督式學習模型進行分類 39
5.3 研究限制 40
5.4 未來研究方向與建議 40
參考文獻 42

參考文獻

英文部分
[1] D. Marchette, "A statistical method for profiling network traffic," 1999.
[2] S. Zander, T. Nguyen, and G. Armitage, "Automated traffic classification and
application identification using machine learning," in The IEEE Conference on Local
Computer Networks 30th Anniversary (LCN′05)l, 17-17 Nov. 2005 2005, pp. 250-257,
doi: 10.1109/LCN.2005.35.
[3] J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, "Network Traffic
Classification Using Correlation Information," IEEE Transactions on Parallel and
Distributed Systems, vol. 24, no. 1, pp. 104-117, 2013, doi: 10.1109/TPDS.2012.98.
[4] M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, and M. Saberian, "Deep
packet: A novel approach for encrypted traffic classification using deep learning," Soft
Computing, vol. 24, no. 3, pp. 1999-2012, 2020.
[5] M. Soysal and E. G. Schmidt, "Machine learning algorithms for accurate flow-based
network traffic classification: Evaluation and comparison," Performance Evaluation,
vol. 67, no. 6, pp. 451-467, 2010.
[6] J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, "Robust Network Traffic
Classification," IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1257-
1270, 2015, doi: 10.1109/TNET.2014.2320577.
[7] Mohammad Reza Parsaei, Mohammad Javad Sobouti, Seyed Raouf khayami and Reza
Javidan, “Network Traffic Classification using Machine Learning Techniques over
Software Defined Networks” International Journal of Advanced Computer Science
and Applications(IJACSA), 8(7), 2017.
http://dx.doi.org/10.14569/IJACSA.2017.080729
[8] S. Ezennaya-Gomez, S. Kiltz, C. Kraetzer, and J. Dittmann, "A Semi-Automated
HTTP Traffic Analysis for Online Payments for Empowering Security, Forensics and
Privacy Analysis," presented at the Proceedings of the 16th International Conference
on Availability, Reliability and Security, Vienna, Austria, 2021. [Online]. Available:
https://doi.org/10.1145/3465481.3470114
[9] A. Kaur and M. Saluja, "Investigating TCP/IP, HTTP, ARP, ICMP Packets Using
Wireshark," 2014.
[10] A. G. D’Sa, I. Illina, D. Fohr, D. Klakow, and D. Ruiter, "Label Propagation-Based
Semi-Supervised Learning for Hate Speech Classification," Online, November 2020:
Association for Computational Linguistics, in Proceedings of the First Workshop on
Insights from Negative Results in NLP, pp. 54-59, doi: 10.18653/v1/2020.insights-1.8.
[Online]. Available: https://aclanthology.org/2020.insights-1.8
43
https://doi.org/10.18653/v1/2020.insights-1.8
[11] A. Azab, M. Khasawneh, S. Alrabaee, K.-K. R. Choo, and M. Sarsour, "Network
traffic classification: Techniques, datasets, and challenges," Digital Communications
and Networks, 2022/09/18/ 2022, doi:
https://doi.org/10.1016/j.dcan.2022.09.009

中文部分
[1] 陳品瑄, 陳俊傑, and 梁明章, "網路流量異常偵測分析－以 TWAREN 為例,"
2019, no. 2019: 國立金門大學, pp. 174-178, doi: 10.6927/ncs.201911.0035.
[2] 張智傑, "適用於網路入侵偵測不平衡資料之階層式多重分類器," 碩士, 電機工
程學研究所, 國立臺灣大學, 台北市, 2015. [Online]. Available:
https://hdl.handle.net/11296/59h7u3
[3] 蔡秉任, "針對未知攻擊辨識之混合式入侵偵測系統," 碩士, 資訊科學與工程研
究所, 國立交通大學, 新竹市, 2014. [Online]. Available:
https://hdl.handle.net/11296/54f85v
[4] 陳建智, 蔡雨龍, and 周國森, "開放網路架構異常流量之檢測技術," (in 繁體中
文), 電工通訊季刊, no. 2021第4季, pp. 81-92, 2021, doi:
10.6328/ciee.202112_(4).0007.
[5] 蕭漢威, 曾金山, 魏志平, and 楊竹星, "以網際網路流量進行網路服務分類預測
之研究," (in 繁體中文), 網際網路技術學刊, vol. 5, no. 1, pp. 49-55, 2004, doi:
10.6138/jit.2004.5.1.07.
[6] 連崇翰, "基於二元搜尋法上的封包分類演算法," 碩士, 資訊工程學系所, 國立中
興大學, 台中市, 2014. [Online]. Available:
https://hdl.handle.net/11296/pnz6kh
[7] 張瑜倫, "基於長短期記憶模型之異常網路流量偵測," 碩士, 資訊工程研究所,
國立中正大學, 嘉義縣, 2019. [Online]. Available:
https://hdl.handle.net/11296/y42zp6
[8] 高子棋, "一個偵測HTTP服務新型態異常的新穎方法," 碩士, 資訊工程學系所,
國立中興大學, 台中市, 2020. [Online]. Available:
https://hdl.handle.net/11296/2fnug3
[9] 李陳洋, "以知識蒸餾實現網路內學習之流量分類," 碩士, 網路工程研究所, 國立
交通大學, 新竹市, 2020. [Online]. Available:
https://hdl.handle.net/11296/7k593u
[10] 王澤宇, "機器學習於入侵偵測之資安成效研究：封包流量、系統日誌與系統資
源統計之比較," 碩士, 資訊科學與工程研究所, 國立交通大學, 新竹市, 2021.
[Online]. Available:
https://hdl.handle.net/11296/5x276g
[11] 許嘉榮, "一個有效的半監督式學習方法應用於入侵偵測系統," 碩士, 資訊工程
學系研究所, 國立中山大學, 高雄市, 2021. [Online]. Available:
44
https://hdl.handle.net/11296/vka9j2
[12] 林冠宏, "使用少量標記資料以半監督式學習建立砂輪表面異常檢測模型," 碩士,
工業與資訊管理學系碩士在職專班, 國立成功大學, 台南市, 2021. [Online].
Available:
https://hdl.handle.net/11296/6e6p2v

指導教授

柯士文(Shih-Wen Ke)

審核日期

2023-7-15

推文