博碩士論文 111453001 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系在職專班zh_TW
DC.creator曾令騰zh_TW
DC.creatorLING-TENG TSENGen_US
dc.date.accessioned2024-5-14T07:39:07Z
dc.date.available2024-5-14T07:39:07Z
dc.date.issued2024
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=111453001
dc.contributor.department資訊管理學系在職專班zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract本研究專注於資訊安全領域中類別不平衡的問題,著重於二分類與五分類的機器學習實驗。透過分析不同分類器(包括ANN、KNN、RF、SVM)在處理不同類別數據時的效能,探索了多種數據處理技術包括過採樣(Random Oversampling、SMOTE、Borderline SMOTE、ADASYN)、欠採樣(ENN、Tomek Links)和混合方法(SMOTE-ENN、SMOTE-Tomek Links)。在處理類別不平衡的數據集時,選擇合適的模型和數據處理策略對於降低型二錯誤率至關重要。減少型二錯誤意味著提高了對少數類的識別能力,這對於許多應用來說,如醫療診斷、資訊安全等,是極其關鍵的。二分類資料使用個案A公司的資訊安全Log,日誌資料被分類為「有危害」和「無危害」兩種類型,在類別不平衡的情況下,資安風險中最重要的就是減少型二錯誤,也就是明明有資安風險卻被判別為無資安風險,實驗結果在ANN + Random Oversampling有著最低的型二錯誤率9.09%,相較於原始資料的型二錯誤率(ANN :81% 、KNN: 54% 、RF: 24% 、SVM :45%)降低許多。五分類使用著名的KDD99網路入侵偵測資料集,先做前處理把22種攻擊類型轉為四大類攻擊,其中極度不平衡的數據集(類別四(R2L)和類別五(U2R)),在不同的分類器上處理的表現有顯著差異。特別是在使用過採樣技術後,對於類別五的預測性能有顯著提升,其中ANN + SMOTE-ENN組合對於類別五的性能提升最為明顯,此外分析還顯示,在降低少數類別的型二錯誤率時可能會提高多數類別的錯誤率,顯示了處理類別不平衡問題的複雜性,並強調了選擇合適的數據處理策略的重要性。zh_TW
dc.description.abstractThis study focuses on the issue of class imbalance within the field of information security, emphasizing experiments in binary and five-class machine learning classification. By analyzing the performance of different classifiers (including ANN, KNN, RF, SVM) in handling various categories of data, a range of data processing techniques was explored, including oversampling (Random Oversampling, SMOTE, Borderline SMOTE, ADASYN), undersampling (ENN, Tomek Links), and hybrid methods (SMOTE-ENN, SMOTE-Tomek Links). Selecting appropriate models and data processing strategies is crucial for reducing Type II error rates when dealing with imbalanced datasets. For binary classification, the study used information security logs from Company A, and it categorized the log data into ′harmful′ and ′harmless′. In scenarios of class imbalance, reducing Type II errors, which misclassify actual security risks as non-threatening, is of utmost importance. The experimental results showed that ANN + Random Oversampling achieved the lowest Type II error rate of 9.09%, a significant reduction compared to the original data′s Type II error rates (ANN: 81%, KNN: 54%, RF: 24%, SVM: 45%). For the five-class classification, the study used the renowned KDD99 dataset, initially preprocessing 22 types of attacks into four major categories. In this extremely imbalanced dataset (especially for categories 4 (R2L) and 5 (U2R)), significant differences in performance were observed among the classifiers. Notably, the predictive performance for category 5 significantly improved after applying oversampling techniques, with the ANN + SMOTE-ENN combination showing the most pronounced improvement for category 5. Furthermore, the analysis indicated that reducing the Type II error rate for minority classes might increase the error rate for majority classes, highlighting the complexity of addressing class imbalance issues and underscoring the importance of selecting suitable data processing strategies.en_US
DC.subject資訊安全zh_TW
DC.subject類別不平衡zh_TW
DC.subject二分類zh_TW
DC.subject多分類zh_TW
DC.subject數據重採樣技術zh_TW
DC.subjectInformation Securityen_US
DC.subjectClass Imbalanceen_US
DC.subjectBinaryen_US
DC.subjectFive-classen_US
DC.subjectData Resamplingen_US
DC.title資訊安全中的類別不平衡:欠採樣、過採樣和混合方法的比較研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleAddressing Class Imbalance in Information Security: Comparative Analysis of Undersampling, Oversampling, and Hybrid Approachesen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明