深度學習技術於類別不平衡問題之應用;Deep Learning for the Class Imbalance Problem

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/86572

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86572

題名:	深度學習技術於類別不平衡問題之應用;Deep Learning for the Class Imbalance Problem
作者:	?玟榛;Huang, Wen-Zhen
貢獻者:	資訊管理學系
關鍵詞:	類別不平衡;資料探勘;機器學習;深度學習;增加少數法;Class Imbalance;Data Mining;Machine Learning;Deep learning;Over-sampling
日期:	2021-07-15
上傳時間:	2021-12-07 12:59:09 (UTC+8)
出版者:	國立中央大學
摘要:	在資料探勘領域中，如何針對有類別不平衡問題（Class imbalance problem）的資料集進行有效的分類一直是一個非常重要的議題，類別不平衡問題指的是當資料集某一類別樣本數量遠大於另一類別的樣本數量時，會導致在建立模型時，資料的偏態分布造成模型會傾向於將小類資料（Minority class）誤判為大類資料（Majority class），使得小類資料經常被忽略。由於類別不平衡問題經常存在於許多實際應用上，如故障診斷（Fault diagnosis）、醫學診斷（Medical diagnosis）、盜刷偵測（Fraud detection）等等，因此近十年來，有許多學者致力於研究處理類別不平衡問題的方法。在過往文獻中，類別不平衡的處理方法大致分為三種層面，包含演算法層面、資料層面以及成本敏感法等，而以往資料層面相關文獻當中，大多為使用資料前處理方式搭配機器學習技術所建構的分類器來處理類別不平衡問題。而隨著近年來深度學習技術的普及，為資料探勘研究帶來了新的可能性，然而，目前卻鮮少有人嘗試使用深度學習技術所建構之分類器應用在類別不平衡資料集中，因此本論文欲使用深度學習技術所建構之分類器，搭配資料前處理的 SMOTE 方法（Synthetic minority over-sampling technique）來處理類別不平衡問題，以探討深度學習技術所建構之分類器效果是否能夠優於傳統機器學習技術所建構之分類器。本研究使用 44 個來自 KEEL 網站上的二元類別不平衡資料集，以及 8 個 NASA 資料集。首先進行資料的前處理，並搭配兩種深度學習模型（D-MLP、DBN）進行訓練以及測試，計算出 AUC 結果後與過往文獻之方法進行正確率比較。從本實驗結果而言，整體來說使用資料層級方法搭配深度學習分類器 D-MLP 和 DBN 效果會比機器學習技術所建構之分類器效能較佳，若將資料集區分為高低類別不平衡資料集時，在高類別不平衡比率的情況下，DBN 會擁有更佳的表現，若不考慮類別不平衡比率，則是 D-MLP 擁有整體較佳的表現。;Effective classification for class imbalance datasets is always an important issue of data mining. The class imbalance problem means when the number of samples in one class outnumbers the other classes in a dataset. The learning model will tend to misclassify the minority class into the majority class because of the skewed class distribution. Due to the class imbalance problem occurs in many real-world applications, for example, fault diagnosis, medical diagnosis, fraud detection and so on, there are many researchers committed to the methods to handle the class imbalance datasets in past decades. In the literatures, the class imbalance problem can be solved from three different ways, including algorithm level methods, data level methods and cost-sensitive methods. Particularly, data level methods are widely considered, such as under- and over-sampling techniques. In recent years, deep learning techniques have demonstrated their outperformances over many machine learning techniques. However, very few studies examine their applicability on class imbalance datasets. Therefore, the research objective is to perform SMOTE as the over-sampling method to re-balance the class imbalance datasets and then construct the deep learning models for performance comparison. In this research, 44 class imbalanced datasets collected from the KEEL dataset repository and 8 datasets from NASA are used for the experiment. In addition, the deep neural networks including deep multilayer perceptron (D-MLP) and deep belief network (DBN) are compared with some representative baseline learning models. The experimental results show that SMOTE combining with deep learning classifiers perform better than traditional machine learning classifiers. In particular, the DBN classifier performs better than others for the datasets with high imbalance ratios, whereas the D-MLP classifier has an overall better performance than the other classifiers.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	56	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....