博碩士論文 107423050 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator曾俊凱zh_TW
DC.creatorChun-Kai Tsengen_US
dc.date.accessioned2020-7-21T07:39:07Z
dc.date.available2020-7-21T07:39:07Z
dc.date.issued2020
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107423050
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract不平衡資料集在實務資料分析中是非常重要的一環,如信用卡盜刷、醫療診斷分類和網路攻擊分類等不同領域內重要問題。面對不平衡資料集我們可以採取不同的資料處理或使用不同分類方法達到更好的分類效果。單一類別分類方法在不同的領域中可以稱作為離群值檢測或奇異點偵測,本論文嘗試使用單一類別分類方法於不平衡資料集中二分類問題如單分類支援向量機器(One-Class SVM)、孤立森林(Isolation Forest)和局部異常因子(Local Outlier Factor)。進一步探討若資料發生缺失的情況,透過模擬遺漏值10%~50%且將使用如分類與回歸樹方法(Classification And Regression Trees)將資料填補至接近原始資料,增加分類模型的分類正確率。同時也對不平衡資料中存在影響分類方法的雜值採取樣本選取方法如Instance Based algorithm(IB3)、Decremental Reduction Optimization Procedure(DROP3)、Genetic Algorithm(GA)希望減少資料集中雜質與減少訓練模型的時間成本且找出足夠影響力的資料 本論文baseline使用完整的不平衡資料與單一類別分類方法與各項實驗分析比較。探討遺漏值填補與單一類別分類方法以及哪個樣本選取方法會使單一類別分類方法正確率提升,最後探討模擬遺漏值和樣本選取方法與填補的先後順序,流程改善能夠增加分類器正確率。經過上述實驗流程以及結果,可以發現不平衡資料經過遺漏值填補之後分類正確率接近;透過樣本選取方法可以增加分類正確率同時發現樣本篩檢率會直接影響分類正確率;最後透過遺漏值與樣本選取方法的搭配,可以發現將完整資料與不完整資料拆開處理的流程可以改善分類正確率,而選擇平穩正確率的情況下使用完整資料進行模擬遺漏與填補以及搭配樣本選取方法則會有較佳的表現。zh_TW
dc.description.abstractImbalanced data sets are a very important part of practical data analysis, such as credit card fraud, medical diagnosis classification and network attack. Faced with imbalanced data sets, we can adopt different data processing or use different classification methods to achieve better classification results. This paper attempts to use the one-class classification methods to classify two classification problems in imbalanced data sets, such as the one-class SVM, Isolated Forest and Local Outlier Factor. To further explore the case of missing data, by simulating missing values of 10% to 50% and using methods such as CART to impute the data, increase the classification accuracy. At the same time, Instance selection methods such as IB3, DROP3, and GA are also adopted for the imbalanced data. Hope to reduce impurities in the data set and reduce the time to train the model cost and find sufficient information Discuss the missing value filling and one-class classification methods and which instance selection methods will improve the accuracy. Simulate missing value and instance selection methods and the order of filling. After the above experimental process and results, it can be found that when missing value is filled classification accuracy is close to classification accuracy; through the instance selection methods, the classification accuracy can be increased and the reduction rate is found to directly affect the classification correct rate; finally, the missing value and combination of selection methods, it can be found the process of separating the incomplete data from the complete data can improve the classification accuracy. However, when the stable accuracy is selected, using the complete data to simulate the missing values and filling and uses the instance selection methods will have good performance.en_US
DC.subject不平衡資料集zh_TW
DC.subject單一類別分類方法zh_TW
DC.subject遺漏值填補zh_TW
DC.subject樣本選取方法zh_TW
DC.subjectImbalance data setsen_US
DC.subjectOne-Class Classificationen_US
DC.subjectMissing value imputationen_US
DC.subjectInstance selectionen_US
DC.title單一類別分類方法於不平衡資料集-搭配遺漏值填補和樣本選取方法zh_TW
dc.language.isozh-TWzh-TW
DC.titleOne-class classification on imbalanced datasets with missing value imputation and instance selectionen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明