摘要: | 隨著現今網路技術蓬勃發展,促使智慧型設備以及物聯網裝置大幅提升,因此在網路安全(Cybersecurity)的重要性也隨之提升。為了有效抵禦網路攻擊(Cyberattack),現今使用人工智慧(Artificial Intelligence, AI)模型來實現入侵檢測系統(Intrusion Detection System, IDS),用來偵測網路惡意流量,由於AI模型具有複雜的超參數空間,若只依賴人工方式手動調整超參數,可能會造成付出的成本變得高昂,且較不容易找出最佳的超參數配置。 本論文為了解決不易找出模型的最佳超參數配置的問題,提出(Bayesian Optimization - Light Gradient Boosting Machine, BO-LGBM)機制,用來建立網路惡意流量分類模型,此機制利用貝葉斯演算法(Bayesian Optimization, BO)來找出(Light Gradient Boosting Machine, LightGBM)模型的最佳超參數配置,從而提升模型在流量分類的準確度。本論文採用IoT20資料集作為模型的輸入,實驗結果中於網路惡意流量分類有著98.89%的F1-score,相較人工手動方式設置超參數的LightGBM模型可以提升5.33%。此外BO-LGBM相比於Random Forest、Bagging、CatBoost以及CNN都具有更高的準確度,而且在模型大小和預測時間上更為輕量和快速。本論文還採用eXplainable Artificial Intelligence(XAI)技術對模型的輸入特徵進行分析,並取得各攻擊類別的特徵重要性,再通過XAI分析出的結果來降低模型輸入維度,以降低模型的負擔。在LightGBM模型特徵刪除結果中可以在幾乎不影響模型準確度的情況下,降低10.5%的預測時間與提升11.8%的Throughput,另外在降低 17.18% 的預測時間和提升 20.43% 的 Throughput 的情況下,模型仍可保有 96.18 %的 F1-Score。;With the rapid development of current internet technologies, the proliferation of smart devices and Internet of Things (IoT) devices has significantly increased. Consequently, the importance of cybersecurity has also risen. To effectively defend against cyberattacks, Artificial Intelligence (AI) models are now employed to implement Intrusion Detection Systems (IDS) to detect network malicious traffic. Due to the complex hyperparameter space of AI models, relying solely on manual adjustments can be costly and make it difficult to find the optimal hyperparameter configuration. This paper addresses the challenge of identifying the optimal hyperparameter configuration for models by proposing a Bayesian Optimization - Light Gradient Boosting Machine (BO-LGBM) mechanism. This mechanism leverages Bayesian Optimization (BO) to determine the best hyperparameter settings for the Light Gradient Boosting Machine (LightGBM) model, thereby improving the model′s accuracy in traffic classification. The IoT20 dataset is used as the input for the model in this paper. Experimental results show that the BO-LGBM achieves an F1-score of 98.89% in network malicious traffic classification, representing a 5.33% improvement over manually configured LightGBM models. Additionally, BO-LGBM demonstrates higher accuracy compared to Random Forest, Bagging, CatBoost, and CNN, and is more lightweight and faster in terms of model size and prediction time. This paper also employs eXplainable Artificial Intelligence (XAI) techniques to analyze the input features of the model, obtaining feature importance for each attack category. The XAI analysis results are then used to reduce the dimensionality of the model′s input, thus decreasing the model′s burden. The feature removal results in the LightGBM model show that it can reduce prediction time by 10.5% and increase throughput by 11.8% without significantly affecting the model′s accuracy. Furthermore, when reducing prediction time by 17.18% and increasing throughput by 20.43%, the model can still maintain an F1-Score of 96.18%. |