博碩士論文 106423020 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator戴郁庭zh_TW
DC.creatorYu-Ting Daien_US
dc.date.accessioned2019-7-1T07:39:07Z
dc.date.available2019-7-1T07:39:07Z
dc.date.issued2019
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106423020
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在充滿資料的世界中,越來越多企業希望可以運用這些資料來提高企業競爭力,然而真實世界中類別不平衡(Class Imbalance)以及資料遺漏(Missing Value)的問題一直是非常重要的問題,如醫學診療、破產預測等不同領域都經常發生類別不平衡問題,在類別不平衡中問題中,資料集出現大類資料(Majority Class)的樣本數大於小類資料(Minority Class)的樣本數,資料也因此呈現偏態分布,為了有較高的分類正確率,使用一般的分類器所建立出來的預測模型也會因受到偏態分布的影響而誤判為大類資料,此外若這些珍貴的小類資料出現遺漏時,可用的資料點就更加稀少。 本論文基於動態時間校正(Dynamic Time Warping)的概念作為核心,使用與過去不同的補值方式進行補值,利用動態時間校正的特點來解決小類樣本出現資料遺漏的問題,而此方法也不受限於需要完整資料列做為補值參考,因此在實驗中會將小類資料模擬10%、30%、50%、70%、90%的資料遺漏。 本論文實驗了17個KEEL,搭配二種分類器(SVM、Decision Tree)建立分類模型,比較不同補值方式的AUC(Area Under Curve)結果。最後,KEEL資料集的實驗結果顯示,使用動態時間校正和K-NN補值法比較後,在50%~90%的資料遺漏率下,動態時間校正的補值依然有著良好的表現。 zh_TW
dc.description.abstractIn a world full of information, more and more companies want to use this information to improve their competitiveness. However, the problems of “Class Imbalance” and “Missing Value” have always been important issues in the real world. For example, class imbalance datasets often occur in different fields such as medical diagnosis and bankruptcy prediction. In class imbalance, the number of samples of the majority class in the dataset is larger than that of the minority class, and the data will look skewed. In order to have a higher classification accuracy rate, the prediction model established by the general classifier will also be misjudged as a large class of data due to the influence of the skewed distribution. If the precious minority class contains some missing data, the available data are even rarer. In this thesis, dynamic time warping is used as the core for the missing value imputation task. Dynamic time warping correction feature is used to solve the problem of missing data in the minority class containing small numbers of samples. And this method is not limited to the need for a complete data sample. Therefore, in the experiment, 10%, 30%, 50%, 70%, and 90% missing rates of the minority class data are simulated. In this paper, we use 17 KEEL datasets for the experiment, and two classification models (SVM, Decision Tree) are constructed, and the AUC (Area Under Curve) are examined for different methods. The experimental results show that the dynamic time warping has good performance under the missing rate of 50%~90%, which performs better than the KNN imputation method.en_US
DC.subject類別不平衡zh_TW
DC.subject遺漏值zh_TW
DC.subject補值方法zh_TW
DC.subject動態時間校正zh_TW
DC.subjectclass imbalanceen_US
DC.subjectdata miningen_US
DC.subjectmissing valueen_US
DC.subjectimputationen_US
DC.subjectdynamic time warpingen_US
DC.title以動態時間校正進行類別不平衡資料之遺漏值處理zh_TW
dc.language.isozh-TWzh-TW
DC.titleMissing value imputation for class imbalance data: a dynamic warping approachen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明