深度學習技術於類別不平衡問題之應用

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	黄玟榛	zh_TW
DC.creator	Wen-Zhen Huang	en_US
dc.date.accessioned	2021-7-15T07:39:07Z
dc.date.available	2021-7-15T07:39:07Z
dc.date.issued	2021
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107423065
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	在資料探勘領域中，如何針對有類別不平衡問題（Class imbalance problem）的資料集進行有效的分類一直是一個非常重要的議題，類別不平衡問題指的是當資料集某一類別樣本數量遠大於另一類別的樣本數量時，會導致在建立模型時，資料的偏態分布造成模型會傾向於將小類資料（Minority class）誤判為大類資料（Majority class），使得小類資料經常被忽略。由於類別不平衡問題經常存在於許多實際應用上，如故障診斷（Fault diagnosis）、醫學診斷（Medical diagnosis）、盜刷偵測（Fraud detection）等等，因此近十年來，有許多學者致力於研究處理類別不平衡問題的方法。在過往文獻中，類別不平衡的處理方法大致分為三種層面，包含演算法層面、資料層面以及成本敏感法等，而以往資料層面相關文獻當中，大多為使用資料前處理方式搭配機器學習技術所建構的分類器來處理類別不平衡問題。而隨著近年來深度學習技術的普及，為資料探勘研究帶來了新的可能性，然而，目前卻鮮少有人嘗試使用深度學習技術所建構之分類器應用在類別不平衡資料集中，因此本論文欲使用深度學習技術所建構之分類器，搭配資料前處理的 SMOTE 方法（Synthetic minority over-sampling technique）來處理類別不平衡問題，以探討深度學習技術所建構之分類器效果是否能夠優於傳統機器學習技術所建構之分類器。本研究使用 44 個來自 KEEL 網站上的二元類別不平衡資料集，以及 8 個 NASA 資料集。首先進行資料的前處理，並搭配兩種深度學習模型（D-MLP、DBN）進行訓練以及測試，計算出 AUC 結果後與過往文獻之方法進行正確率比較。從本實驗結果而言，整體來說使用資料層級方法搭配深度學習分類器 D-MLP 和 DBN 效果會比機器學習技術所建構之分類器效能較佳，若將資料集區分為高低類別不平衡資料集時，在高類別不平衡比率的情況下，DBN 會擁有更佳的表現，若不考慮類別不平衡比率，則是 D-MLP 擁有整體較佳的表現。	zh_TW
dc.description.abstract	Effective classification for class imbalance datasets is always an important issue of data mining. The class imbalance problem means when the number of samples in one class outnumbers the other classes in a dataset. The learning model will tend to misclassify the minority class into the majority class because of the skewed class distribution. Due to the class imbalance problem occurs in many real-world applications, for example, fault diagnosis, medical diagnosis, fraud detection and so on, there are many researchers committed to the methods to handle the class imbalance datasets in past decades. In the literatures, the class imbalance problem can be solved from three different ways, including algorithm level methods, data level methods and cost-sensitive methods. Particularly, data level methods are widely considered, such as under- and over-sampling techniques. In recent years, deep learning techniques have demonstrated their outperformances over many machine learning techniques. However, very few studies examine their applicability on class imbalance datasets. Therefore, the research objective is to perform SMOTE as the over-sampling method to re-balance the class imbalance datasets and then construct the deep learning models for performance comparison. In this research, 44 class imbalanced datasets collected from the KEEL dataset repository and 8 datasets from NASA are used for the experiment. In addition, the deep neural networks including deep multilayer perceptron (D-MLP) and deep belief network (DBN) are compared with some representative baseline learning models. The experimental results show that SMOTE combining with deep learning classifiers perform better than traditional machine learning classifiers. In particular, the DBN classifier performs better than others for the datasets with high imbalance ratios, whereas the D-MLP classifier has an overall better performance than the other classifiers.	en_US
DC.subject	類別不平衡	zh_TW
DC.subject	資料探勘	zh_TW
DC.subject	機器學習	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	增加少數法	zh_TW
DC.subject	Class Imbalance	en_US
DC.subject	Data Mining	en_US
DC.subject	Machine Learning	en_US
DC.subject	Deep learning	en_US
DC.subject	Over-sampling	en_US
DC.title	深度學習技術於類別不平衡問題之應用	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Deep Learning for the Class Imbalance Problem	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 107423065 完整後設資料紀錄