博碩士論文 109423007 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator楊博翰zh_TW
DC.creatorPo-Han Yangen_US
dc.date.accessioned2024-7-29T07:39:07Z
dc.date.available2024-7-29T07:39:07Z
dc.date.issued2024
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=109423007
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract本研究探討異常值檢測技術在處理類別不平衡資料集當中的應用,並評估其結合過採樣技術對模型預測性能的影響。研究分別針對少數類別和多數類別的異常值進行偵測並刪除,然後使用SMOTE (Synthetic Minority Over-sampling TEchnique)過採樣方法進行過採樣,以平衡兩類別的樣本數量。藉由實驗分析,本研究比較了經過異常值處理和直接過採樣的效果,並分析異常值偵測對模型預測性能的影響。 在實驗設計上,本研究選用了收錄於KEEL-Dataset Repository (Knowledge Extraction based on Evolutionary Learning-Dataset Repository)中的七個二元類別不平衡資料集作為實驗資料集,並挑選了四種不同類型的異常值偵測代表方法進行實驗,分別是LOF (Local Outlier Factor)、iForest(Isolation Forest)、MCD (Minimum Covariance Determinant)及OCSVM (One-Class Support Vector Machine)。實驗中使用了三種分類器:SVM (Support Vector Machine)、Random Forest及LightGBM,觀察分別移除少數類別及多數類別當中的異常值之後,再以SMOTE過採樣方法將資料集類別數量過採樣至平衡,會如何對模型預測性能造成影響。 根據實驗結果顯示,以異常值偵測移除少數類別的異常值不僅無法對模型預測性能有正面的影響,反而導致模型性能下降;另一方面,移除多數類別的異常值可以對模型預測性能有正面的影響,其中,使用LOF移除多數類別中的異常值,對模型性能有最佳的提升效果。這些發現表明,在處理類別不平衡問題時,針對多數類別進行異常值偵測並移除異常值,結合SMOTE過採樣技術,是提高模型預測性能的一種有效策略。zh_TW
dc.description.abstractThis study explores the application of outlier detection techniques in handling imbalanced datasets and evaluates the impact of combining these techniques with over-sampling on model classification performance. The research focuses on detecting and removing outliers from both minority and majority classes, followed by over-sampling using SMOTE (Synthetic Minority Over-sampling TEchnique) to balance the class samples. Through experimental analysis, this study compares the effects of outlier processing and direct over-sampling, analyzing the impact of outlier detection on model classification performance. Seven binary imbalanced datasets from the KEEL-Dataset Repository were selected for the experiments. Four outlier detection methods were tested: LOF (Local Outlier Factor), iForest (Isolation Forest), MCD (Minimum Covariance Determinant), and OCSVM (One-Class Support Vector Machine). Three classifiers were used: SVM (Support Vector Machine), Random Forest, and LightGBM. The study observed the impact on model performance after removing outliers from the majority and minority classes and then using SMOTE to balance the datasets. The experimental results showed that removing outliers from the minority class did not improve model performance and even caused a decline. In contrast, removing outliers from the majority class had a positive impact, with LOF providing the best improvement. These findings suggest that for addressing class imbalance, detecting and removing outliers from the majority class combined with SMOTE over-sampling is an effective strategy to improve model classification performance.en_US
DC.subject機器學習zh_TW
DC.subject類別不平衡zh_TW
DC.subject異常值偵測zh_TW
DC.subject過採樣zh_TW
DC.subjectSMOTEzh_TW
DC.subjectMachine learningen_US
DC.subjectClass imbalanceen_US
DC.subjectOutlier detectionen_US
DC.subjectOver-samplingen_US
DC.subjectSMOTEen_US
DC.title異常值偵測對增進類別不平衡預測的效能評估zh_TW
dc.language.isozh-TWzh-TW
DC.titleThe Effectiveness Evaluation of Outlier Detection in Improving the Predictions of Imbalanced Classesen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明