中大學術數位典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/99406
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 94201/94201 (100%)
Visitors : 81576003      Online Users : 3638
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/99406


    Title: 資料前處理與分類器建構之集成學習技術於類別不平衡資料之研究;A Study on Ensemble Learning Techniques for Data Preprocessing and Classifier Construction in Imbalanced Data
    Authors: 高奕筠;KAO, YI-YUN
    Contributors: 資訊管理學系
    Keywords: 類別不平衡;重採樣;案例選取;集成學習;動態選取;imbalanced data;re-sampling;instance selection;ensemble learning;dynamic selection
    Date: 2026-02-11
    Issue Date: 2026-03-06 18:55:11 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 在真實世界中,類別不平衡(Imbalanced Data)問題廣泛存在,如設備故障預測與醫療診斷。由於傳統機器學習模型通常偏向預測多數類別,因此如何提升分類器對少數類別的辨識能力成為重要課題。目前針對類別不平衡問題的解決策略主要可分為資料層級、演算法層級以及混合層級三大類,但在資料層級方面,現有文獻尚缺乏對於如何將集成學習(Ensemble Learning)概念應用於資料前處理的深入探討。此外,也鮮有研究將集成學習運用於多重分類器的篩選中。
    因此本研究針對此缺口,探討集成學習在資料前處理與分類器建構上對分類表現的影響,使用來自KEEL資料庫的42個類別不平衡資料集,並設計兩組實驗:(1)選用三種重採樣演算法(SMOTE、Cluster Centroids和SMOTEENN)與四種案例選取演算法(ENN、DROP3、IPF和CVCF),並設計12種不同的資料前處理流程,比較不同資料前處理方法(單一與集成)對分類表現的影響,以找出最佳的資料前處理方法;(2)結合六種動態選取演算法(OLA、MLA、MCB、DES-KNN、KNORA-U和DES-P)進行多重分類器建構,評估資料層級與分類器層級集成的協同效果。
    實驗結果顯示,採用重採樣的多重交集方法能提升訓練資料的多樣性與品質並增強分類效能,而所有分類器中以Random Forest表現最優異。而在整合策略方面,將SMOTE後搭配ENN,並結合SVM、CART、KNN三個分類器與KNORA-U動態選取技術,可在AUC指標上取得最優表現(0.863);若重視少數類別的預測能力則建議採用IPF後進行重採樣的聯集,並搭配SVM、KNN、Random Forest(或XGBoost)三個分類器與KNORA-U,在F1-Measure指標上表現最佳(0.739),最終整合策略可依據實際應用情境與預測重點來做選擇。;Imbalanced data is common in real-world applications such as equipment failure prediction and medical diagnosis. Traditional machine learning models often favor the majority class. Therefore, improving a classifier’s ability to recognize the minority class has become a key challenge. However, current literature lacks exploration of how ensemble learning can be incorporated into data preprocessing at the data level. Additionally, few studies have applied ensemble learning to the selection of multiple classifiers.
    To address these gaps, this study investigates the impact of ensemble learning on classification performance in both data preprocessing and classifier construction. A total of 42 imbalanced datasets from the KEEL repository were used, and two sets of experiments were designed: (1) Twelve distinct data preprocessing workflows were designed by three resampling algorithms (SMOTE, Cluster Centroids, and SMOTEENN) and four instance selection algorithms (ENN, DROP3, IPF, and CVCF). These workflows, incorporating both single and ensemble learning based data preprocessing approaches, were identify to determine the most effective preprocessing strategy for handling imbalanced data; (2) Integrate six dynamic selection algorithms (OLA, MLA, MCB, DES-KNN, KNORA-U, and DES-P) for multiple classifier construction to evaluate the synergistic effects of combining data-level and classifier-level ensembles.
    Experimental results show that employing a multi-intersection resampling approach can enhance the diversity and quality of training data, thereby improving classification performance. Random Forest demonstrated the best overall performance. Regarding integration strategies, applying SMOTE followed by ENN, and integrating SVM, CART, and KNN with the dynamic selection technique KNORA-U, achieved the highest AUC (0.863). For tasks prioritizing minority class prediction, the recommended strategy is to apply IPF followed by a union of resampling approach, combined with SVM, KNN, and Random Forest (or XGBoost), along with KNORA-U. This approach achieved the best F1-Measure (0.739). The final integration strategy can be selected according to specific application scenarios and predictive objectives.
    Appears in Collections:[Graduate Institute of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML178View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明