博碩士論文 110423043 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator王珮庭zh_TW
DC.creatorPei-Ting Wangen_US
dc.date.accessioned2023-7-18T07:39:07Z
dc.date.available2023-7-18T07:39:07Z
dc.date.issued2023
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110423043
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在資料探勘領域中,資料的收集往往伴隨著各種資料品質的問題,包括:數據含有重複值 duplicate values 、遺漏值 (missing values)、離群值 (outlier)、資料格式不一 (data inconsistency)等問題,這些問題也間接影響提取有用資訊的困難度。此外,由於現實世界所發生的機率不同,類別不平衡問題(Class Imbalance)也成為資料探勘中一個很重要的課題,此問題會導致在模型預測和分類中,對少數類別的預測性能下降,並對資料分析的準確性和可靠性上產生負面影響。 因此,本論文主要探討類別不平衡問題。根據過往文獻,本研究以資料層級方法,彈性搭配不同分類演算法方式,來對類別不平衡資料集進行重採樣,探討在不同重採樣下,調整類別大小類別比例是否影響分類性能。另外,由於現有文獻中並未提出將不同重採樣所訓練的單一分類器進行集成建立成多重分類器,以及將不同重採樣樣本進行合併,搭配單一分類器或集成式分類器。因此,本研究以集成式方法(Ensemble Method)為基礎,提出同質性(Homogeneous)和異質性(Heterogeneous)方法,探討在不同處理流程下,哪種組合方式可以更好的處理類別不平衡問題。 本研究透過實驗結果,證明在資料前處理方法中以資料層級方法對類別不平衡資料集進行重採樣能有效改善分類表現,且重採樣的大小類別平衡比例對分類器表現有顯著的影響。而在全面比較同質性與異質性方法中,多重分類器和樣本合併方法的單一分類器與集成式分類器,在統計結果中並無差異性。但異質性方法相對於同質性方法,更能夠在不同分類演算法上發掘出最佳的搭配方式,提升分類準確率(AUC)。這些實驗結果為後續研究者提供可進一步拓展與改進集成式分類器的方向,並為解決類別不平衡問題提供更多的選擇和優化策略。zh_TW
dc.description.abstractIn the field of data mining, data collection often comes with various data quality issues, including duplicate values, missing values, outliers, and data inconsistency, which indirectly affect the difficulty of extracting useful information. Furthermore, the class imbalance has become an important issue in data mining due to the different probabilities of events in the real world. This problem leads to decreased predictive performance for minority classes in model prediction and classification, negatively impacting the accuracy and reliability of data analysis. Therefore, this paper focuses on addressing the class imbalance problem. Based on previous literature, this study employs data-level approaches and flexibly combines different classification algorithms to resample class-imbalanced datasets. It explores whether adjusting the class proportions under different resampling techniques affects the classification performance. Moreover, since existing literature does not propose the integration of individual classifiers trained with different resampling techniques to build multiple classifiers or merging different resampled samples with single classifiers or ensemble classifiers, this research proposes homogeneous and heterogeneous methods based on ensemble methods to explore which combination approach can better handle class imbalance problems under different processing flows. Through experimental results, this study demonstrates that resampling class-imbalanced datasets using data-level techniques in data preprocessing can effectively improve classification performance, and the balance ratio of resampled minority and majority classes significantly influences classifier performance. In the comprehensive comparison between homogeneous and heterogeneous methods, there is no statistical difference between multiple classifiers and the single classifier or ensemble classifier using sample merging. However, heterogeneous methods, compared to homogeneous methods, are more capable of exploring the best combinations with different classification algorithms to enhance classification accuracy (AUC). These experimental results provide directions for further expansion and improvement of ensemble classifiers and offer more choices and optimization strategies for addressing class imbalance problems.en_US
DC.subject資料探勘zh_TW
DC.subject類別不平衡zh_TW
DC.subject集成式學習zh_TW
DC.subjectdata miningen_US
DC.subjectclass imbalanceen_US
DC.subjectensemble learningen_US
DC.title同質性與異質性集成式重採樣方法於類別不平衡問題之研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleHomogeneous and Heterogeneous Ensemble Resampling Approaches for the Class Imbalance Problemen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明