博碩士論文 107423028 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator臧怡婷zh_TW
DC.creatorYi-Ting Tsangen_US
dc.date.accessioned2020-7-17T07:39:07Z
dc.date.available2020-7-17T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107423028
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在真實世界的資料集類別不平衡是個常見的問題,在過去文獻裡類別不平衡問題大致可從四個方向去解決,包含資料層面、演算法層面、成本敏感法、集成式學習等,本研究欲從演算法層面去探討,選擇可用單一類別資料進行學習的單分類方法建立預測模型,本研究使用KEEL網站上55個類別不平衡資料集,並選用三種單分類方法分別為單類支援向量機(One-class SVM, OCSVM)、孤立森林(Isolation forest, IF)、區域異常因子法(Local outlier factor, LOF)。而過去文獻指出,資料前處理能提升資料的品質,進而提升模型的效能,且目前較少研究二分類資料集經特徵選取前處理後再搭配單分類方法建模,因此本研究欲搭配特徵選取前處理方法,採用包裝(Wrapper)、過濾(Filter)、嵌入(Embedded)三種類別的特徵選取方法各一,分別為基因演算法(Genetic algorithm, GA)、主成分分析法(Principal component analysis, PCA)、C4.5決策樹(C4.5 Decision tree),欲探討哪一種特徵選取方法搭配哪一種單分類方法可提升分類效果,以及單分類模型表現是否會受到類別不平衡比率高低影響,更結合集成式學習概念,結合數個不同的基礎分類器形成最終的預測模型,是否能進一步提升分類表現。 從實驗結果來看,整體來說C4.5特徵選取可提升單分類模型的表現,但如分為高低類別不平衡比率後來看,在低比率情況下,C4.5特徵選取有助於提升OCSVM、IF的表現,但仍不及直接使用C4.5方法建模的表現;在高比率時,GA特徵選取有助於提升OCSVM、LOF的表現,C4.5則有助於提升IF的表現,且三種單分類方法不管搭配哪種特徵選取方法皆贏過直接使用C4.5,因此單分類方法比C4.5適合用於高類別不平衡比率的資料集。搭配集成式學習後,由先前實驗的結果排名前八名集合的異質性集成模型AUC最高可達83.24%。zh_TW
dc.description.abstractIn the real world datasets, the class imbalance problem is very common. In the literatures, the class imbalance problem can be solved from four different ways, including data level methods, algorithm level methods, cost-sensitive methods, and ensemble learning. This thesis aims to explore the algorithm level method, where one-class classification algorithms are considered, which can learn from one-class data to build the one-class classifier. In addition, 55 class imbalanced datasets from the KEEL dataset repository are used for the experiment, and three one-class classification algorithms, including One-Class SVM (OCSVM), Isolation Forest (IF), and Local Outlier Factor (LOF) are compared. From the past researches, data pre-processing, such as feature selection, can improve the quality of data, and thus improve the performance of classifiers. Moreover, few studies focus on performing feature selection over binary classification datasets and then combining with one-class classification methods. Therefore, three different types of feature selection methods are employed: wrapper, filter, and embedded methods, which are based on Genetic Algorithm (GA), Principal Component Analysis (PCA), and C4.5 decision tree (C4.5), respectively. As a result, the research objective is to find out which one-class classification algorithm combining with which feature selection algorithm can perform the best. Moreover, the relationship between the class imbalance ratio and the performance of one-class classifiers is examined. The second research objective is to apply the ensemble learning technique to combine several different one-class classifiers to examine whether one-class classifier ensembles can further improve the performance of single one-class classifiers. The experimental results show that the C4.5 feature selection can overall improve the performance of the one-class classifiers. However, when the imbalance ratio is divided into high and low imbalance ratio groups, the C4.5 feature selection combined with OCSVM and IF perform better than the others for the datasets with low class imbalance ratios. For the datasets with high imbalance ratios, GA can to improve the performance of OCSVM, LOF, whereas C4.5 feature selection helps to improve the performance of IF, and no matter which feature selection method is used, the three one-class classifiers perform better than using C4.5 directly. After using the ensemble learning technique, the AUC of the heterogeneous classifier ensembles based on combining the top eight base one-class classifiers outperform the other classifier ensembles and single one-class classifiers, which can provide the AUC rate of 83.24%.en_US
DC.subject類別不平衡zh_TW
DC.subject單分類方法zh_TW
DC.subject特徵選取zh_TW
DC.subject集成式學習zh_TW
DC.subject資料探勘zh_TW
DC.subjectClass Imbalanceen_US
DC.subjectOne-Class Classificationen_US
DC.subjectFeature Selectionen_US
DC.subjectEnsemble Learningen_US
DC.subjectData Miningen_US
DC.title單分類方法於類別不平衡資料集之研究-結合特徵選取與集成式學習zh_TW
dc.language.isozh-TWzh-TW
DC.titleOne Class Classification on Imbalanced Datasets Using Feature Selection and Ensemble Learningen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明