English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78937/78937 (100%)
造訪人次 : 39421942      線上人數 : 584
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86662


    題名: 考量特徵選取與隨機森林之遺漏值填補技術
    作者: 林彥呈;Lin, Yen-Cheng
    貢獻者: 資訊管理學系
    關鍵詞: 遺漏值填補;隨機森林;特徵選取;missing value imputation;random forest;feature selection
    日期: 2021-08-17
    上傳時間: 2021-12-07 13:05:34 (UTC+8)
    出版者: 國立中央大學
    摘要: 遺漏值填補(Missing Value Imputation, MVI)是研究人員進行資料分析的重要過程,因為大多數的機器學習方法都不適用於不完整的數據集(如神經網絡和支持向量機),隨意忽略該步驟更會導致嚴重的分類錯誤。對於醫療領域來說,因為並非所有可能的測試都可以對每個患者進行,再加上人為疏失、設備故障等意外因素干擾,遺漏值的存在已是一個常見的問題,這不僅增加了相關人員在分析、預測等任務上的難度,也影響了患者所應該受到的即時診斷和治療。

    在補值領域的研究中,missForest是一種相當受歡迎的補值方法,儘管其表現已被證明優於其它已知的填補方法,然而卻少有研究考慮對其進行優化或進一步的探討。因此,本研究嘗試了當前流行於補值研究的特徵選取方法—RFE,將其與missForest合併提出了一種新的RFE_missForest補值法,並使用在自Kaggle及UCI所取得的共10個醫療數據集,在進行10%~50%的遺漏率模擬後,和missForest以及另外三個傳統的補值方法比較各自在連續型和類別型變量的填補品質。

    最後的研究結果顯示,由本研究所提出的RFE_missForest分別在3種連續型數據集以及3種混合型數據集上,不論是NRMSE或是PFC都有著最好的表現,優於其他4種現有的補值方法,並且統計差異顯著。
    ;Missing Value Imputation (MVI) is an important process in data mining, because sometimes it will cause serious problems for classification. One of the most serious problems is that the majority of classification algorithms do not work on incomplete datasets (such as neural networks and support vector machines). In the medical field, because of not all possible tests can be done on every patient, and coupled with the interference of accidental factors such as human negligence and equipment failure, the existence of missing values is a common problem. It not only increases the difficulty in tasks such as analysis and prediction, but also affects the immediate diagnosis and treatment that patients should receive.
    In the research field of missing value imputation, missForest is a very popular imputation method. Although its performance has been proved to be better than other known imputation methods, there are few studies considering its optimization or further discussion. Therefore, this study tried the feature selection method currently popular in missing value imputation research—RFE, combined it with missForest and propose a new imputation method RFE_missForest. We used a total of 10 medical data sets obtained from Kaggle and UCI, simulating the missing rate of 10% to 50%, then compare the filling quality of continuous and categorical data sets with missForest and three other traditional imputation methods.
    Experimental results show that our RFE_missForest algorithm has the best performance both on 3 continuous data sets and 3 mixed data sets, whether it is NRMSE or PFC. The proposed method was also validated by t-test and has a significant difference.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML157檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明