English  |  正體中文  |  简体中文  |  Items with full text/Total items : 69937/69937 (100%)
Visitors : 23029621      Online Users : 338
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/84043

    Title: 單一類別分類方法於不平衡資料集-搭配遺漏值填補和樣本選取方法;One-class classification on imbalanced datasets with missing value imputation and instance selection
    Authors: 曾俊凱;Tseng, Chun-Kai
    Contributors: 資訊管理學系
    Keywords: 不平衡資料集;單一類別分類方法;遺漏值填補;樣本選取方法;Imbalance data sets;One-Class Classification;Missing value imputation;Instance selection
    Date: 2020-07-21
    Issue Date: 2020-09-02 17:58:00 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 不平衡資料集在實務資料分析中是非常重要的一環,如信用卡盜刷、醫療診斷分類和網路攻擊分類等不同領域內重要問題。面對不平衡資料集我們可以採取不同的資料處理或使用不同分類方法達到更好的分類效果。單一類別分類方法在不同的領域中可以稱作為離群值檢測或奇異點偵測,本論文嘗試使用單一類別分類方法於不平衡資料集中二分類問題如單分類支援向量機器(One-Class SVM)、孤立森林(Isolation Forest)和局部異常因子(Local Outlier Factor)。進一步探討若資料發生缺失的情況,透過模擬遺漏值10%~50%且將使用如分類與回歸樹方法(Classification And Regression Trees)將資料填補至接近原始資料,增加分類模型的分類正確率。同時也對不平衡資料中存在影響分類方法的雜值採取樣本選取方法如Instance Based algorithm(IB3)、Decremental Reduction Optimization Procedure(DROP3)、Genetic Algorithm(GA)希望減少資料集中雜質與減少訓練模型的時間成本且找出足夠影響力的資料
    ;Imbalanced data sets are a very important part of practical data analysis, such as credit card fraud, medical diagnosis classification and network attack. Faced with imbalanced data sets, we can adopt different data processing or use different classification methods to achieve better classification results. This paper attempts to use the one-class classification methods to classify two classification problems in imbalanced data sets, such as the one-class SVM, Isolated Forest and Local Outlier Factor. To further explore the case of missing data, by simulating missing values of 10% to 50% and using methods such as CART to impute the data, increase the classification accuracy. At the same time, Instance selection methods such as IB3, DROP3, and GA are also adopted for the imbalanced data. Hope to reduce impurities in the data set and reduce the time to train the model cost and find sufficient information
    Discuss the missing value filling and one-class classification methods and which instance selection methods will improve the accuracy. Simulate missing value and instance selection methods and the order of filling. After the above experimental process and results, it can be found that when missing value is filled classification accuracy is close to classification accuracy; through the instance selection methods, the classification accuracy can be increased and the reduction rate is found to directly affect the classification correct rate; finally, the missing value and combination of selection methods, it can be found the process of separating the incomplete data from the complete data can improve the classification accuracy. However, when the stable accuracy is selected, using the complete data to simulate the missing values and filling and uses the instance selection methods will have good performance.
    Appears in Collections:[資訊管理研究所] 博碩士論文

    Files in This Item:

    File Description SizeFormat

    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback  - 隱私權政策聲明