以動態時間校正進行類別不平衡資料之遺漏值處理

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	戴郁庭	zh_TW
DC.creator	Yu-Ting Dai	en_US
dc.date.accessioned	2019-7-1T07:39:07Z
dc.date.available	2019-7-1T07:39:07Z
dc.date.issued	2019
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106423020
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	在充滿資料的世界中，越來越多企業希望可以運用這些資料來提高企業競爭力，然而真實世界中類別不平衡（Class Imbalance）以及資料遺漏(Missing Value)的問題一直是非常重要的問題，如醫學診療、破產預測等不同領域都經常發生類別不平衡問題，在類別不平衡中問題中，資料集出現大類資料（Majority Class）的樣本數大於小類資料（Minority Class）的樣本數，資料也因此呈現偏態分布，為了有較高的分類正確率，使用一般的分類器所建立出來的預測模型也會因受到偏態分布的影響而誤判為大類資料，此外若這些珍貴的小類資料出現遺漏時，可用的資料點就更加稀少。本論文基於動態時間校正(Dynamic Time Warping)的概念作為核心，使用與過去不同的補值方式進行補值，利用動態時間校正的特點來解決小類樣本出現資料遺漏的問題，而此方法也不受限於需要完整資料列做為補值參考，因此在實驗中會將小類資料模擬10%、30%、50%、70%、90%的資料遺漏。本論文實驗了17個KEEL，搭配二種分類器（SVM、Decision Tree）建立分類模型，比較不同補值方式的AUC（Area Under Curve）結果。最後，KEEL資料集的實驗結果顯示，使用動態時間校正和K-NN補值法比較後，在50%~90%的資料遺漏率下，動態時間校正的補值依然有著良好的表現。	zh_TW
dc.description.abstract	In a world full of information, more and more companies want to use this information to improve their competitiveness. However, the problems of “Class Imbalance” and “Missing Value” have always been important issues in the real world. For example, class imbalance datasets often occur in different fields such as medical diagnosis and bankruptcy prediction. In class imbalance, the number of samples of the majority class in the dataset is larger than that of the minority class, and the data will look skewed. In order to have a higher classification accuracy rate, the prediction model established by the general classifier will also be misjudged as a large class of data due to the influence of the skewed distribution. If the precious minority class contains some missing data, the available data are even rarer. In this thesis, dynamic time warping is used as the core for the missing value imputation task. Dynamic time warping correction feature is used to solve the problem of missing data in the minority class containing small numbers of samples. And this method is not limited to the need for a complete data sample. Therefore, in the experiment, 10%, 30%, 50%, 70%, and 90% missing rates of the minority class data are simulated. In this paper, we use 17 KEEL datasets for the experiment, and two classification models (SVM, Decision Tree) are constructed, and the AUC (Area Under Curve) are examined for different methods. The experimental results show that the dynamic time warping has good performance under the missing rate of 50%~90%, which performs better than the KNN imputation method.	en_US
DC.subject	類別不平衡	zh_TW
DC.subject	遺漏值	zh_TW
DC.subject	補值方法	zh_TW
DC.subject	動態時間校正	zh_TW
DC.subject	class imbalance	en_US
DC.subject	data mining	en_US
DC.subject	missing value	en_US
DC.subject	imputation	en_US
DC.subject	dynamic time warping	en_US
DC.title	以動態時間校正進行類別不平衡資料之遺漏值處理	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Missing value imputation for class imbalance data: a dynamic warping approach	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 106423020 完整後設資料紀錄