單一類別分類方法於不平衡資料集－搭配遺漏值填補和樣本選取方法

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	曾俊凱	zh_TW
DC.creator	Chun-Kai Tseng	en_US
dc.date.accessioned	2020-7-21T07:39:07Z
dc.date.available	2020-7-21T07:39:07Z
dc.date.issued	2020
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107423050
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	不平衡資料集在實務資料分析中是非常重要的一環，如信用卡盜刷、醫療診斷分類和網路攻擊分類等不同領域內重要問題。面對不平衡資料集我們可以採取不同的資料處理或使用不同分類方法達到更好的分類效果。單一類別分類方法在不同的領域中可以稱作為離群值檢測或奇異點偵測，本論文嘗試使用單一類別分類方法於不平衡資料集中二分類問題如單分類支援向量機器（One-Class SVM）、孤立森林（Isolation Forest）和局部異常因子（Local Outlier Factor）。進一步探討若資料發生缺失的情況，透過模擬遺漏值10%~50%且將使用如分類與回歸樹方法（Classification And Regression Trees）將資料填補至接近原始資料，增加分類模型的分類正確率。同時也對不平衡資料中存在影響分類方法的雜值採取樣本選取方法如Instance Based algorithm（IB3）、Decremental Reduction Optimization Procedure（DROP3）、Genetic Algorithm（GA）希望減少資料集中雜質與減少訓練模型的時間成本且找出足夠影響力的資料本論文baseline使用完整的不平衡資料與單一類別分類方法與各項實驗分析比較。探討遺漏值填補與單一類別分類方法以及哪個樣本選取方法會使單一類別分類方法正確率提升，最後探討模擬遺漏值和樣本選取方法與填補的先後順序，流程改善能夠增加分類器正確率。經過上述實驗流程以及結果，可以發現不平衡資料經過遺漏值填補之後分類正確率接近；透過樣本選取方法可以增加分類正確率同時發現樣本篩檢率會直接影響分類正確率；最後透過遺漏值與樣本選取方法的搭配，可以發現將完整資料與不完整資料拆開處理的流程可以改善分類正確率，而選擇平穩正確率的情況下使用完整資料進行模擬遺漏與填補以及搭配樣本選取方法則會有較佳的表現。	zh_TW
dc.description.abstract	Imbalanced data sets are a very important part of practical data analysis, such as credit card fraud, medical diagnosis classification and network attack. Faced with imbalanced data sets, we can adopt different data processing or use different classification methods to achieve better classification results. This paper attempts to use the one-class classification methods to classify two classification problems in imbalanced data sets, such as the one-class SVM, Isolated Forest and Local Outlier Factor. To further explore the case of missing data, by simulating missing values of 10% to 50% and using methods such as CART to impute the data, increase the classification accuracy. At the same time, Instance selection methods such as IB3, DROP3, and GA are also adopted for the imbalanced data. Hope to reduce impurities in the data set and reduce the time to train the model cost and find sufficient information Discuss the missing value filling and one-class classification methods and which instance selection methods will improve the accuracy. Simulate missing value and instance selection methods and the order of filling. After the above experimental process and results, it can be found that when missing value is filled classification accuracy is close to classification accuracy; through the instance selection methods, the classification accuracy can be increased and the reduction rate is found to directly affect the classification correct rate; finally, the missing value and combination of selection methods, it can be found the process of separating the incomplete data from the complete data can improve the classification accuracy. However, when the stable accuracy is selected, using the complete data to simulate the missing values and filling and uses the instance selection methods will have good performance.	en_US
DC.subject	不平衡資料集	zh_TW
DC.subject	單一類別分類方法	zh_TW
DC.subject	遺漏值填補	zh_TW
DC.subject	樣本選取方法	zh_TW
DC.subject	Imbalance data sets	en_US
DC.subject	One-Class Classification	en_US
DC.subject	Missing value imputation	en_US
DC.subject	Instance selection	en_US
DC.title	單一類別分類方法於不平衡資料集－搭配遺漏值填補和樣本選取方法	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	One-class classification on imbalanced datasets with missing value imputation and instance selection	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 107423050 完整後設資料紀錄