English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41633953      線上人數 : 3505
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89800


    題名: 遺漏值於類別不平衡問題之研究;Missing Values in Class Imbalanced Datasets
    作者: 黃閔慈;Hwang, Min-Tzu
    貢獻者: 資訊管理學系
    關鍵詞: 遺漏值填補;類別不平衡;增加少數法;Class-specific missingness
    日期: 2022-07-15
    上傳時間: 2022-10-04 12:00:19 (UTC+8)
    出版者: 國立中央大學
    摘要: 現實世界的資料集難以避免地會有遺漏值 (Missing Values),然而若無經過 適當的處理,多數的傳統分類器無法直接運行外,更可能產生錯誤的資料探勘 結果。此外,類別不平衡問題 (Class Imbalance)亦是資料探勘領域中的重要議題, 然而傳統分類器基於類別平衡的假設下而傾向於忽略小類樣本,導致模型正確 預測小類樣本的效能不佳。
    基於遺漏值和類別不平衡問題的普遍性,近年越來越多研究探討如何處理 帶有遺漏值的類別不平衡資料集,但仍鮮少有研究針對某一類別的資料,特別 是小類資料,帶有較多遺漏值的情況進行探討。此外,在遺漏值填補和資料層 級方式 (Data Level) 皆可能影響原始資料分佈的前提下,現有文獻並無探討面對 類別不平衡資料集帶有遺漏值時,執行遺漏值填補與資料層級方式 (Data level) 的先後順序對於補值效能和後續分類準確率之相關研究。
    因此本研究針對小類訓練資料進行模擬遺漏,以比較三種處理流程搭配六 種補值法和 SMOTE 的表現,而處理流程中,除將順序調換外,亦於先補值處 理流程中另外提出僅以原始完整小類資料作為新生成小類樣本的基礎。實驗發 現,於多數遺漏率下,不同處理流程的順序於均方根誤差表現上具有顯著差異。 而分類準確率表現則隨著遺漏率高低而有所不同,於低遺漏率 (10~50%)時,採 取先補值的處理流程仍有顯著較佳的分類表現,而於高遺漏率 (70~90%)時,採 取先補值但僅以完整小類資料作為新生成小類樣本基礎的處理流程,於隨機森 林分類器有顯著較佳的分類準確率表現。此外,本研究亦針對不同需求,推薦 高、低遺漏率區間下表現較佳的處理流程和其搭配的補值法組合,以供未來研 究者參考。;In real-world datasets, missing values will inevitably occur. However, without appropriate treatments, most conventional machine learning models cannot tackle the missing values directly, and may even cause false results. In addition, the class imbalance is also a critical issue in machine learning and data mining. However, conventional machine learning models tend to ignore the minority class, which leads to degraded classification performances.
    Due to the universality of missing values and class imbalance problems, in recent years, there are growing studies exploring how to deal with class imbalanced data with missing values. However, few studies have discussed the situation where a certain class of data, especially the minority class, has more missing values than the majority class. Moreover, based on the acknowledgment that missing value imputation and Data level approaches may change the distribution of the original data, there are few studies that discuss the implementation order of missing value imputation and Data level approaches when tackling class imbalanced data with missing values.
    To this end, in this paper, we compare the performance of three processing procedures with six approaches of missing value imputation and SMOTE when the minority class of training data has more missing values. In the three processing procedures, in addition to changing the order, we also proposed to use only the complete training data subset of the minority class to serve as the basis for creating synthetic samples for the minority class.
    The experimental results show that under most situations of missing rates, the order of different processing procedures has significant differences in the performance of RMSE. The performance of classification ability varies with the level of missing rate. When the missing rate is less than or equal to 50%, the processing procedure of imputing first has better classification performance, while the missing rate is higher than 50%, the processing procedure of imputing first and only uses the complete training data subset of the minority class to serve as the basis for creating synthetic samples for the minority class has a significantly better classification performance in the random forest classifier. Furthermore, we also recommend better performance processing procedures with the combination of missing value imputation approaches in the different levels of missing rate for future researchers′ reference.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML50檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明