用於工控系統非均衡網路流量資料之降噪自動編碼器極限梯度提升異常的偵測與分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：18.191.103.10

姓名

陳沿廷(Yan-Ting Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

用於工控系統非均衡網路流量資料之降噪自動編碼器極限梯度提升異常的偵測與分類
(Anomaly Detection and Classification Based on Denoising Autoencoder and XGBoost for Imbalanced Network Traffic Data in Industrial Control Systems)

相關論文

★ 以IEEE 802.11為基礎行動隨意無線網路之混合式省電通訊協定	★ 以范諾圖為基礎的對等式網路虛擬環境相鄰節點一致性研究
★ 行動隨意網路可調適及可延展之位置服務協定	★ 同儕式網路虛擬環境高效率互動範圍群播
★ 巨量多人線上遊戲之同儕網路互動範圍語音交談	★ 基於范諾圖之同儕式網路虛擬環境狀態管理
★ 利用多變量分析之多人線上遊戲信任使用者選擇	★ 無位置資訊無線感測網路之覆蓋及連通維持
★ 同儕網路虛擬環境3D串流同儕選擇策略	★ 一個使用802.11與RFID技術的無所不在導覽系統U-Guide之設計與實作
★ 同儕式三維資料串流	★ IM Finder: 透過即時通訊網路線上使用者找尋解答
★ 無位置資訊無線感測網路自走車有向天線導航與協調演算法	★ 多匯點無線感測網路省能及流量分散事件輪廓追蹤
★ 頻寬感知同儕式3D串流	★ 無線感測網路旋轉指向天線定位法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

工控系統(Industrial Control System, ICS)整合資訊技術(Information Technology, IT)與運營技術(Operational Technology, OT)，是近年工業領域熱門的研究主題。 ICS 廣泛應用於控制與管理透過網路聯結的重要機器設備，若 ICS 遭受來源不明的網路攻擊，可能導致設備運作異常，因而造成巨大經濟損失甚至於影響人員的安危。因此，針對ICS 網路安全的研究是關鍵且必要的。
本篇論文提出一個關於ICS 網路安全的異常偵測與分類方法，用以偵測使用工業傳輸協定 Modbus 與 S7 Comm (S7 Communication) 的網路流量資料 (network traffic data)是否異常，並對異常資料進行分類。本論文提出的方法包含三項主要步驟，以最大化異常偵測與分類效果。首先，使用降噪自動編碼器 (Denoising Autoencoder, DAE) 去除資料中潛在的雜訊。其次，面對含有異常行為的不平衡(imbalanced)資料，採用SMOTE (Synthetic Minority Oversampling Technique) 與 Tomek link (T-Link) 結合的資料過採樣(oversampling)與欠採樣(undersampling)方法，用以增加特定樣本的特徵代表性。最後使用極限梯度提升(eXtreme Gradient Boosting, XGBoost)建立異常偵測與分類模型。
本篇論文採用真實鐵路工業ICS的Electra資料集，用以評估所提方法的效能並和其他相關方法進行比較。實驗結果顯示，本篇論文提出的異常偵測與分類的方法，相較於其他異常偵測方法有較佳的精確度 (precision)、召回率 (recall) 與 F1-score 。

摘要(英)

The industrial control system (ICS), which integrates information technology (IT) and operational technology (OT), is a hot research topic in the industrial field in recent years. ICS is widely used to control and manage important machines and devices connected through networks. If the ICS suffers from network attacks, machines and devices may work abnormally, causing huge economic losses and even affecting the safety of personnel. Therefore, research on ICS network security is critical and necessary.
This thesis proposes an anomaly detection and classification method for ICS network security to detect and classify abnormalities in network traffic data of industrial field protocols like Modbus and S7 Communication (S7 Comm). The proposed method contains three major steps, as shown below. First, it uses the denoising autoencoder (DAE) to remove potential noise in data. Second, in face of imbalanced data of abnormalities, the synthetic minority oversampling technique (SMOTE) and the Tomek link (T-Link) mechanism are used to oversample and undersample data to increase representative characteristics of particular samples. Finally, extreme gradient boosting (XGBoost) is used to build anomaly detection and classification models.
The real-life railway industry ICS dataset Electra is used to evaluate the effectiveness of the proposed method. The evaluation results are compared with those of other related methods. The proposed method is shown to have better precision, recall and F1-score than others in terms of both anomaly detection and anomaly classification.

關鍵字(中)

★ 異常分類
★ 異常偵測
★ 自動編碼器
★ 資料不平衡
★ F1-分數
★ 工業控制系統
★ 精確度
★ 召回率
★ 極限梯度提升

關鍵字(英)

★ Anomaly Classification
★ Anomaly Detection
★ Autoencoder
★ Data Imbalance
★ F1-score
★ Industrial Control System
★ Precision
★ Recall
★ XGBoost

論文目次

中文摘要 IX
Abstract X
誌謝 XI
圖目錄 XIV
表目錄 XV
一、緒論 1
1.1 研究背景與動機 1
1.2 研究目的與方法 2
1.3 論文架構 3
二、背景知識 4
2.1 異常偵測 4
2.2 機器學習 6
2.2.1 機器學習介紹 6
2.2.2 監督式學習 6
2.2.3 非監督式學習 7
2.2.4 半監督式學習 7
2.2.5 強化式學習 7
2.3 深度學習 8
2.3.1 深度學習介紹 8
2.3.2 多層感知器 8
2.3.3 激勵函數 10
2.3.4 反向傳播演算法 12
2.4 自動編碼器 14
2.4.1 自動編碼器介紹 14
2.4.2 正規化自動編碼器 15
2.5 過採樣與欠採樣 17
2.5.1 不平衡資料 17
2.5.2 隨機過採樣與隨機欠採樣 17
2.5.3 合成少數群集過採樣技術 18
2.5.4 Tomek Links 19
2.6 集成式學習 20
2.6.1 集成式學習介紹 20
2.6.2 引導聚合 20
2.6.3 自適應提升 21
2.6.4 梯度提升 22
2.7 相關研究 24
三、問題定義 28
3.1 問題定義 28
3.2 標籤定義 30
四、研究方法 31
4.1 資料前處理 31
4.2 模型架構 32
4.3 評估標準 35
五、實驗和分析 38
5.1 實驗環境 38
5.2 實驗結果與分析 38
5.2.1 使用Electra Modbus資料集進行異常偵測的效能比較 39
5.2.2 使用Electra Modbus資料集進行異常分類的效能比較 39
5.2.3 使用Electra S7Comm資料集進行異常偵測的效能比較 42
5.2.4 使用Electra S7Comm資料集進行異常分類的效能比較 43
六、結論和未來展望 46
參考文獻 47

參考文獻

[1] Stuxnet
(https://en.wikipedia.org/wiki/Stuxnet)
[2] Karnouskos, S. (2011, November). Stuxnet worm impact on industrial cyber-physical system security. In IECON 2011-37th Annual Conference of the IEEE Industrial Electronics Society (pp. 4490-4494). IEEE.
[3] Kaplan, A., & Haenlein, M. (2019). Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Business Horizons, 62(1), 15-25.
[4] Guérillot, D. R., & Bruyelle, J. (2017, March). Uncertainty assessment in production forecast with an optimal artificial neural network. In SPE Middle East oil & gas show and conference. Society of Petroleum Engineers.
[5] Activation function
(https://en.wikipedia.org/wiki/Activation_function)
[6] Yang, Y. C., & Jiang, J. R. (2019, October). Web-based Machine Learning Modeling in a Cyber-Physical System Construction Assistant. In 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE) (pp. 478-481). IEEE
[7] Autoencoder
(https://en.wikipedia.org/wiki/Autoencoder)
[8] Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008, July). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning (pp. 1096-1103).

[9] Park, S., Gil, M. S., Im, H., & Moon, Y. S. (2019). Measurement noise recommendation for efficient Kalman filtering over a large amount of sensor data. Sensors, 19(5), 1168.
[10] Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011, January). Contractive auto-encoders: Explicit invariance during feature extraction. In Icml.
[11] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[12] Tomek, I. (1976). Two modifications of CNN.
[13] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
[14] Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.
[15] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
[16] Kearns, M. (1988). Learning Boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88 Harvard University Aikem Computation Laboratory.
[17] Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM (JACM), 41(1), 67-95.
[18] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
[19] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
[20] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[21] Centre for Research in Cyber Security, iTrust.
(https://itrust.sutd.edu.sg/)
[22] Mathur, A. P., & Tippenhauer, N. O. (2016, April). SWaT: a water treatment testbed for research and training on ICS security. In 2016 international workshop on cyber-physical systems for smart water networks (CySWater) (pp. 31-36). IEEE.
[23] Ahmed, C. M., Palleti, V. R., & Mathur, A. P. (2017, April). WADI: a water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks (pp. 25-28).
[24] Adepu, S., Kandasamy, N. K., & Mathur, A. (2018). Epic: An electric power testbed for research and training in cyber physical systems security. In Computer Security (pp. 37-52). Springer, Cham.
[25] Gómez, Á. L. P., Maimó, L. F., Celdran, A. H., Clemente, F. J. G., Sarmiento, C. C., Masa, C. J. D. C., & Nistal, R. M. (2019). On the generation of anomaly detection datasets in industrial control systems. IEEE Access, 7, 177460-177473.
[26] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
[27] Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7), 1443-1471.
[28] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 eighth ieee international conference on data mining (pp. 413-422). IEEE.
[29] Ning, B., Qiu, S., Zhao, T., & Li, Y. Power IoT Attack Samples Generation and Detection Using Generative Adversarial Networks. In 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2) (pp. 3721-3724). IEEE.
[30] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:1406.2661.
[31] https://sthalles.github.io/intro-to-gans/
[32] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). PMLR
[33] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[34] Batista, G. E., Bazzan, A. L., & Monard, M. C. (2003, December). Balancing Training Data for Automated Annotation of Keywords: a Case Study. In WOB (pp. 10-18).
[35] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
[36] dmlc XGBoost
(https://xgboost.ai/)
[37] Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. Z. (2019). XAI—Explainable artificial intelligence. Science Robotics, 4(37).

指導教授

江振瑞(Jehn-Ruey Jiang)

審核日期

2021-7-16

推文