English  |  正體中文  |  简体中文  |  Items with full text/Total items : 69937/69937 (100%)
Visitors : 23266287      Online Users : 489
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/68687


    Title: 監督式學習演算法於填補遺漏值之比較與研究
    Authors: 鄭秉豪;Cheng,Ping-hao
    Contributors: 資訊管理學系
    Keywords: 資料探勘;遺漏值;資料補值;監督式學習
    Date: 2015-07-02
    Issue Date: 2015-09-23 14:07:48 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 隨著資訊科技的日益進步,人們在資訊蒐集與應用上的受益,是最貼近生活且最明顯的部分。資料的記載、儲存並不僅限於經驗的保留及傳承。透過資訊系統的建置、方法的改良及優化,人們更能將資料有效率的分門別類及管理、應用和推測,而資料探勘(Data Mining)技術便是在這樣的背景下日趨成熟、演進。資料探勘採用了多樣的統計分析及模組方式來針對大量資料進行分析,並設法提取具有隱含價值的特徵及關聯性加以應用。然而,在這些隱藏價值的萃取過程中,資料本身所具有的部份特質將一定程度的對結果造成影響,例如:資料遺漏。
    遺漏值(Missing Value)之於資料探勘,是造成探勘資料不完整的一項原因,而資料遺漏的原因可能來自人為的資料輸入錯誤、隱瞞或背景差異等主觀影響所造成的缺失;亦可能來自機器本身,如:儲存失敗、硬體故障、毀損等導致特定時段內的資料遺漏等。因此,在進行資料探勘時遺漏值問題往往導致了探勘效能的降低。
    目前,人們針對遺漏值的處理提出了許多解決策略。其中,使用監督式學習演算法做為補值預測的應用更是其中的佼佼者。然而,針對各種演算法在補值應用上的成效卻無一統整性的應用與建議。著眼於此,本研究嘗試透過使用多種較為知名的監督式學習演算法來針對遺漏資料進行預測並補值後,再將補值結果輔以多項的正確率評估,進而分析及探討各類補值法在不同情境下的表現,統整、歸納並提出建議供後續研究者(或具有補值需求者)在針對遺漏值處理上能更切實的以最具效力及效益的方法來進行應用。;With the progress of Information Technology, people are benefited from efficient data collection and its related applications. In addition, since the number and the size of online databases grow rapidly, the way to retrieve useful information from these large databases effectively and efficiently is getting more important. This has become the research issue of data mining.
    Data mining is a process of using a variety of statistical analyses or machine learing techniques for large amounts of data, including analyzing and managing the way of extracting the hidden values of features and their relevance to vairous applications. It helps people to learn novel knowledge by passing experiences that they can make the decision or forecaste the trend. However, from the retrieval process, there are some problems that should be considered, such as “Missing Values”.
    Missing values can briefly defined as the (attribute) value that is missed in a chosen dataset. For example, when registering on websites, users have to fill in some columns sequentially, such as “Name”,”Birthday”…etc. However, because of some reasons, like data input errors, information concealing and so on, we may lost some data values through this process and these lost may cause data incomplete or some errors. Moreover, it can reduce the efficiency and accuracy of data mining results. In this case, people try to use some methods to impute missing values, and supervised learning algorithms is one of these common approach for the missing value impution problem.
    In this thesis, I try to conduct experiments to compare the efficiency and accuracy between five famous supervised learning algorithms, which are Bayes, SVM, MLP, CART, k-NN, over categorical, numerical, and mix types of datasets. This allows us to know which imputation method performs better in what data type over the dataset with how many missing rates. The experimental results show that the CART method is the best choice for missing value imputation, which not only requires relative lower imputation time, but also can make the classifier provide the higher classification accuracy.
    Appears in Collections:[資訊管理研究所] 博碩士論文

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML383View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback  - 隱私權政策聲明