English  |  正體中文  |  简体中文  |  Items with full text/Total items : 76531/76531 (100%)
Visitors : 29706932      Online Users : 203
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/65598

    Title: 資料前處理:整合補值法與樣本選取之研究;A Study of Data Pre-process: the Integration of Imputation and Instance Selection
    Authors: 張復喻;Chang,Fu-yu
    Contributors: 資訊管理學系
    Keywords: 遺漏值;資料探勘;資料前處理;補值法;樣本選取法;missing value;data mining;data pre-process;imputation;instance selection
    Date: 2014-07-01
    Issue Date: 2014-10-15 17:05:59 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 資料集中的遺漏或異常資料樣本,都會對資料探勘的過程造成影響,使得探勘的結果正確性下降。因此,在資料探勘前的資料前處理是有其必要性的。而資料前處理即是針對存在於資料集中的遺漏或異常樣本進行處理或篩選,較常使用的方法為「補值法」與「樣本選取法」。
    ;In practice, the collected data usually contain some missing values and noise, which are likely to degrade the data mining performance. As a result, data pre-processing step is necessary before data mining. The aim of data pre-processing is to deal with missing values and filter out noise data. In particular, “imputation” and “instance selection” are two common solutions for the data pre-processing purpose.
    The aim of imputation is to provide estimations for missing values by reasoning from the observed data (i.e., complete data). Although various missing value imputation algorithms have been proposed in literature, the outputs for the missing values produced by most imputation algorithms heavily rely on the complete (training) data. Therefore, if some of the complete data contains noise, it will directly affect the quality of the imputation and data mining results. In this thesis, four integration processes were proposed, in which one process is to execute instance selection first to remove several noisy (complete) data from the training set. Then, the imputation process is performed based on the reduced training set (Process 2). On the contrary, the imputation process is employed first to produce a complete training set. Then, instance selection is performed to filter out some noisy data from this set (Process 1). In or to filter out more representative data, instance selection is performed again over the outputs produced by Processes 1 and 2 (Process 3 & Process 4).
    The experiments are based 31 different data sets, which contain categorical, numerical, and mixed types of data, and 10% intervals for different missing rates per dataset (i.e. from 10% to 50%). A decision tree model is then constructed to extract useful rules to recommend when (no. of sample, no. of attribute, no. of classed, type of dataset, missing rate) to use which kind of the integration process.
    Appears in Collections:[資訊管理研究所] 博碩士論文

    Files in This Item:

    File Description SizeFormat

    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明