English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 40306725      線上人數 : 392
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95589


    題名: 基於集成方法的二元分類資料集補值研究;An Imputation Method based on Ensemble Techniques for Binary Classification Datasets
    作者: 彭柏豪;Peng, Po-Hao
    貢獻者: 資訊管理學系
    關鍵詞: 機器學習;深度學習;遺漏值補值;集成式學習;Machine learning;Deep learning;Missing value imputation;Ensemble learning
    日期: 2024-07-29
    上傳時間: 2024-10-09 17:04:47 (UTC+8)
    出版者: 國立中央大學
    摘要: 從過往研究中發現,補值方法大致可分成三大類:統計、機器學習與深度學習,不同種類的方法都有其適用情境,所以本研究將集成的技術應用於補值任務中,旨在將多個補值方法進行結合,並且依據各方法對不同情境的適用性,分配出適當的權重,以此產生出優異的填補值。
    實驗設計上,本研究選用收錄於UCI-dataset的六個二元分類資料集。依據過往的文獻探討,選出各類別的補值方法,分別為統計方法Mean/Mode、MICE,機器學習方法MissForest、KNN,以及深度學習方法PC-GAIN、HI-VAE和PMIVAE,並基於PC-GAIN方法進行調整形成RC-GAIN方法,使用總共八種補值方法,以及使用SVM、LightGBM和MLP三種分類器,進行實驗。
    本研究以實驗篩選出四個性能較佳的入選補值方法MICE、MissForest、RC-GAIN及HI-VAE,以及最佳分類器LightGBM,並以上述方法建構出集成補值方法。透過兩種性能指標:RMSE以及由LightGBM產生之Accuracy,計算出兩種權重,產生出兩種集成方法:〖Ensemble〗_rmse和〖Ensemble〗_acc。實驗結果顯示,兩種集成方法之性能在不同遺漏機制以及不同遺漏率情境中,皆優於四個入選補值方法。其中,集成方法又以〖Ensemble〗_acc性能勝過〖Ensemble〗_rmse,是較佳的補值方法。
    本研究還根據資料集的特性,對集成方法之性能進行適用性分析,在資料集樣本大小的分析中發現,〖Ensemble〗_acc在小型和大型資料集當中,都獲得較佳的性能 ; 在資料集特徵類型的分析中發現,〖Ensemble〗_rmse在純數值型資料集當中表現較佳,而〖Ensemble〗_acc在混合型資料集當中表現較佳 ; 最後,在應用領域的分析中發現,〖Ensemble〗_rmse在醫療資料集中表現較佳,而〖Ensemble〗_acc在信用資料集中表現較佳。
    ;From past research, imputation methods can generally be categorized into three types: statistical, machine learning, and deep learning. Each type of method has its appropriate contexts, so this study applies ensemble techniques to imputation tasks. It aims to combine multiple imputation methods and assigns appropriate weights based on each method′s suitability for different scenarios, thereby generating superior imputed values.
    In terms of experimental design, this study selects six binary classification datasets from the UCI dataset. Based on previous literature, representative methods for each category were selected, including statistical methods Mean/Mode, MICE; machine learning methods MissForest, KNN; and deep learning methods PC-GAIN, HI-VAE, and PMIVAE. Adjustments were made to the PC-GAIN method to form the RC-GAIN method. In total, eight imputation methods were used, and experiments were conducted using SVM, LightGBM, and MLP classifiers.
    The study selected four imputation methods with better performance, MICE, MissForest, RC-GAIN, and HI-VAE, as well as the best classifier, LightGBM, to construct an ensemble imputation method. Two performance metrics, RMSE and Accuracy generated by LightGBM, were used to calculate two types of weights, producing two ensemble methods: 〖Ensemble〗_rmse and 〖Ensemble〗_acc. Experimental results showed that the performance of these two ensemble methods was superior to the four selected imputation methods in different missing mechanisms and missing rate scenarios. Among them, the 〖Ensemble〗_acc method outperformed 〖Ensemble〗_rmse and was the better imputation method.
    The study also analyzed the suitability of the ensemble methods based on dataset characteristics. In the analysis of dataset sizes, 〖Ensemble〗_acc performed better in both small and large datasets. In the analysis of dataset feature types, 〖Ensemble〗_rmse performed better in purely numerical datasets, while 〖Ensemble〗_acc performed better in mixed datasets. Finally, in the application domain analysis, 〖Ensemble〗_rmse performed better in medical datasets, while 〖Ensemble〗_acc performed better in credit datasets.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML8檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明