English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 84303/84303 (100%)
造訪人次 : 63476828      線上人數 : 94
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99406


    題名: 資料前處理與分類器建構之集成學習技術於類別不平衡資料之研究;A Study on Ensemble Learning Techniques for Data Preprocessing and Classifier Construction in Imbalanced Data
    作者: 高奕筠;KAO, YI-YUN
    貢獻者: 資訊管理學系
    關鍵詞: 類別不平衡;重採樣;案例選取;集成學習;動態選取;imbalanced data;re-sampling;instance selection;ensemble learning;dynamic selection
    日期: 2026-02-11
    上傳時間: 2026-03-06 18:55:11 (UTC+8)
    出版者: 國立中央大學
    摘要: 在真實世界中,類別不平衡(Imbalanced Data)問題廣泛存在,如設備故障預測與醫療診斷。由於傳統機器學習模型通常偏向預測多數類別,因此如何提升分類器對少數類別的辨識能力成為重要課題。目前針對類別不平衡問題的解決策略主要可分為資料層級、演算法層級以及混合層級三大類,但在資料層級方面,現有文獻尚缺乏對於如何將集成學習(Ensemble Learning)概念應用於資料前處理的深入探討。此外,也鮮有研究將集成學習運用於多重分類器的篩選中。
    因此本研究針對此缺口,探討集成學習在資料前處理與分類器建構上對分類表現的影響,使用來自KEEL資料庫的42個類別不平衡資料集,並設計兩組實驗:(1)選用三種重採樣演算法(SMOTE、Cluster Centroids和SMOTEENN)與四種案例選取演算法(ENN、DROP3、IPF和CVCF),並設計12種不同的資料前處理流程,比較不同資料前處理方法(單一與集成)對分類表現的影響,以找出最佳的資料前處理方法;(2)結合六種動態選取演算法(OLA、MLA、MCB、DES-KNN、KNORA-U和DES-P)進行多重分類器建構,評估資料層級與分類器層級集成的協同效果。
    實驗結果顯示,採用重採樣的多重交集方法能提升訓練資料的多樣性與品質並增強分類效能,而所有分類器中以Random Forest表現最優異。而在整合策略方面,將SMOTE後搭配ENN,並結合SVM、CART、KNN三個分類器與KNORA-U動態選取技術,可在AUC指標上取得最優表現(0.863);若重視少數類別的預測能力則建議採用IPF後進行重採樣的聯集,並搭配SVM、KNN、Random Forest(或XGBoost)三個分類器與KNORA-U,在F1-Measure指標上表現最佳(0.739),最終整合策略可依據實際應用情境與預測重點來做選擇。;Imbalanced data is common in real-world applications such as equipment failure prediction and medical diagnosis. Traditional machine learning models often favor the majority class. Therefore, improving a classifier’s ability to recognize the minority class has become a key challenge. However, current literature lacks exploration of how ensemble learning can be incorporated into data preprocessing at the data level. Additionally, few studies have applied ensemble learning to the selection of multiple classifiers.
    To address these gaps, this study investigates the impact of ensemble learning on classification performance in both data preprocessing and classifier construction. A total of 42 imbalanced datasets from the KEEL repository were used, and two sets of experiments were designed: (1) Twelve distinct data preprocessing workflows were designed by three resampling algorithms (SMOTE, Cluster Centroids, and SMOTEENN) and four instance selection algorithms (ENN, DROP3, IPF, and CVCF). These workflows, incorporating both single and ensemble learning based data preprocessing approaches, were identify to determine the most effective preprocessing strategy for handling imbalanced data; (2) Integrate six dynamic selection algorithms (OLA, MLA, MCB, DES-KNN, KNORA-U, and DES-P) for multiple classifier construction to evaluate the synergistic effects of combining data-level and classifier-level ensembles.
    Experimental results show that employing a multi-intersection resampling approach can enhance the diversity and quality of training data, thereby improving classification performance. Random Forest demonstrated the best overall performance. Regarding integration strategies, applying SMOTE followed by ENN, and integrating SVM, CART, and KNN with the dynamic selection technique KNORA-U, achieved the highest AUC (0.863). For tasks prioritizing minority class prediction, the recommended strategy is to apply IPF followed by a union of resampling approach, combined with SVM, KNN, and Random Forest (or XGBoost), along with KNORA-U. This approach achieved the best F1-Measure (0.739). The final integration strategy can be selected according to specific application scenarios and predictive objectives.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML33檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明