異常值偵測對增進類別不平衡預測的效能評估;The Effectiveness Evaluation of Outlier Detection in Improving the Predictions of Imbalanced Classes

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/95592

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/95592

題名:	異常值偵測對增進類別不平衡預測的效能評估;The Effectiveness Evaluation of Outlier Detection in Improving the Predictions of Imbalanced Classes
作者:	楊博翰;Yang, Po-Han
貢獻者:	資訊管理學系
關鍵詞:	機器學習;類別不平衡;異常值偵測;過採樣;SMOTE;Machine learning;Class imbalance;Outlier detection;Over-sampling;SMOTE
日期:	2024-07-29
上傳時間:	2024-10-09 17:04:54 (UTC+8)
出版者:	國立中央大學
摘要:	本研究探討異常值檢測技術在處理類別不平衡資料集當中的應用，並評估其結合過採樣技術對模型預測性能的影響。研究分別針對少數類別和多數類別的異常值進行偵測並刪除，然後使用SMOTE (Synthetic Minority Over-sampling TEchnique)過採樣方法進行過採樣，以平衡兩類別的樣本數量。藉由實驗分析，本研究比較了經過異常值處理和直接過採樣的效果，並分析異常值偵測對模型預測性能的影響。在實驗設計上，本研究選用了收錄於KEEL-Dataset Repository (Knowledge Extraction based on Evolutionary Learning-Dataset Repository)中的七個二元類別不平衡資料集作為實驗資料集，並挑選了四種不同類型的異常值偵測代表方法進行實驗，分別是LOF (Local Outlier Factor)、iForest(Isolation Forest)、MCD (Minimum Covariance Determinant)及OCSVM (One-Class Support Vector Machine)。實驗中使用了三種分類器：SVM (Support Vector Machine)、Random Forest及LightGBM，觀察分別移除少數類別及多數類別當中的異常值之後，再以SMOTE過採樣方法將資料集類別數量過採樣至平衡，會如何對模型預測性能造成影響。根據實驗結果顯示，以異常值偵測移除少數類別的異常值不僅無法對模型預測性能有正面的影響，反而導致模型性能下降；另一方面，移除多數類別的異常值可以對模型預測性能有正面的影響，其中，使用LOF移除多數類別中的異常值，對模型性能有最佳的提升效果。這些發現表明，在處理類別不平衡問題時，針對多數類別進行異常值偵測並移除異常值，結合SMOTE過採樣技術，是提高模型預測性能的一種有效策略。 ;This study explores the application of outlier detection techniques in handling imbalanced datasets and evaluates the impact of combining these techniques with over-sampling on model classification performance. The research focuses on detecting and removing outliers from both minority and majority classes, followed by over-sampling using SMOTE (Synthetic Minority Over-sampling TEchnique) to balance the class samples. Through experimental analysis, this study compares the effects of outlier processing and direct over-sampling, analyzing the impact of outlier detection on model classification performance. Seven binary imbalanced datasets from the KEEL-Dataset Repository were selected for the experiments. Four outlier detection methods were tested: LOF (Local Outlier Factor), iForest (Isolation Forest), MCD (Minimum Covariance Determinant), and OCSVM (One-Class Support Vector Machine). Three classifiers were used: SVM (Support Vector Machine), Random Forest, and LightGBM. The study observed the impact on model performance after removing outliers from the majority and minority classes and then using SMOTE to balance the datasets. The experimental results showed that removing outliers from the minority class did not improve model performance and even caused a decline. In contrast, removing outliers from the majority class had a positive impact, with LOF providing the best improvement. These findings suggest that for addressing class imbalance, detecting and removing outliers from the majority class combined with SMOTE over-sampling is an effective strategy to improve model classification performance.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	366	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....