異常值偵測對增進類別不平衡預測的效能評估

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	楊博翰	zh_TW
DC.creator	Po-Han Yang	en_US
dc.date.accessioned	2024-7-29T07:39:07Z
dc.date.available	2024-7-29T07:39:07Z
dc.date.issued	2024
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=109423007
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	本研究探討異常值檢測技術在處理類別不平衡資料集當中的應用，並評估其結合過採樣技術對模型預測性能的影響。研究分別針對少數類別和多數類別的異常值進行偵測並刪除，然後使用SMOTE (Synthetic Minority Over-sampling TEchnique)過採樣方法進行過採樣，以平衡兩類別的樣本數量。藉由實驗分析，本研究比較了經過異常值處理和直接過採樣的效果，並分析異常值偵測對模型預測性能的影響。在實驗設計上，本研究選用了收錄於KEEL-Dataset Repository (Knowledge Extraction based on Evolutionary Learning-Dataset Repository)中的七個二元類別不平衡資料集作為實驗資料集，並挑選了四種不同類型的異常值偵測代表方法進行實驗，分別是LOF (Local Outlier Factor)、iForest(Isolation Forest)、MCD (Minimum Covariance Determinant)及OCSVM (One-Class Support Vector Machine)。實驗中使用了三種分類器：SVM (Support Vector Machine)、Random Forest及LightGBM，觀察分別移除少數類別及多數類別當中的異常值之後，再以SMOTE過採樣方法將資料集類別數量過採樣至平衡，會如何對模型預測性能造成影響。根據實驗結果顯示，以異常值偵測移除少數類別的異常值不僅無法對模型預測性能有正面的影響，反而導致模型性能下降；另一方面，移除多數類別的異常值可以對模型預測性能有正面的影響，其中，使用LOF移除多數類別中的異常值，對模型性能有最佳的提升效果。這些發現表明，在處理類別不平衡問題時，針對多數類別進行異常值偵測並移除異常值，結合SMOTE過採樣技術，是提高模型預測性能的一種有效策略。	zh_TW
dc.description.abstract	This study explores the application of outlier detection techniques in handling imbalanced datasets and evaluates the impact of combining these techniques with over-sampling on model classification performance. The research focuses on detecting and removing outliers from both minority and majority classes, followed by over-sampling using SMOTE (Synthetic Minority Over-sampling TEchnique) to balance the class samples. Through experimental analysis, this study compares the effects of outlier processing and direct over-sampling, analyzing the impact of outlier detection on model classification performance. Seven binary imbalanced datasets from the KEEL-Dataset Repository were selected for the experiments. Four outlier detection methods were tested: LOF (Local Outlier Factor), iForest (Isolation Forest), MCD (Minimum Covariance Determinant), and OCSVM (One-Class Support Vector Machine). Three classifiers were used: SVM (Support Vector Machine), Random Forest, and LightGBM. The study observed the impact on model performance after removing outliers from the majority and minority classes and then using SMOTE to balance the datasets. The experimental results showed that removing outliers from the minority class did not improve model performance and even caused a decline. In contrast, removing outliers from the majority class had a positive impact, with LOF providing the best improvement. These findings suggest that for addressing class imbalance, detecting and removing outliers from the majority class combined with SMOTE over-sampling is an effective strategy to improve model classification performance.	en_US
DC.subject	機器學習	zh_TW
DC.subject	類別不平衡	zh_TW
DC.subject	異常值偵測	zh_TW
DC.subject	過採樣	zh_TW
DC.subject	SMOTE	zh_TW
DC.subject	Machine learning	en_US
DC.subject	Class imbalance	en_US
DC.subject	Outlier detection	en_US
DC.subject	Over-sampling	en_US
DC.subject	SMOTE	en_US
DC.title	異常值偵測對增進類別不平衡預測的效能評估	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	The Effectiveness Evaluation of Outlier Detection in Improving the Predictions of Imbalanced Classes	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 109423007 完整後設資料紀錄