遺漏值填補 – 過去、現在與未來;Past, Present, and Future for Missing Value Imputation

NCUIR > School of Management at National Central University > Department of Information Management > Research Project > Item 987654321/78739

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/78739

Title:	遺漏值填補 – 過去、現在與未來;Past, Present, and Future for Missing Value Imputation
Authors:	蔡志豐
Contributors:	國立中央大學資訊管理學系
Keywords:	填補遺漏值;資料前處理;資料探勘;監督式學習法;missing value imputation;data pre-processing;data mining;supervised learning algorithms
Date:	2018-12-19
Issue Date:	2018-12-20 13:46:28 (UTC+8)
Publisher:	科技部
Abstract:	遺漏值(Missing Value)是造成資料不完整的一項原因，而資料遺漏的原因可能來自人為的資料輸入錯誤、隱瞒或背景差異等主觀影響所造成的缺失;亦可能來自機器本身，如:儲存失敗、硬體故障、毁損等導致特定時段内的資料遺漏等。因此，在進行資料探勘時遺漏值的問題往往導致了探勘效能的降低。針對遺漏值的處理方式可分為直接刪除法以及遺漏值填補法。本研究計晝之第一年研究目的主要在於收集與檢視從2000至今所發表的相關文獻（共超過一百篇論文）進行探討以發現目前填補遺漏值的限制，另一方面將試著瞭解使用直接刪除法之最佳時機（例如於何種資料類型以及多少遺漏率等等）。而第二年的研究目的將著重在統計與監督式學習演算法於填補遺漏值的效能比較，其中將包含六種不同的演算法。最後一年的研究目的將嘗試推出一個混合式學習的遺漏值填補法以提昇填補遺漏值的品質。 ;Incomplete datasets are usually caused by missing values. That is, some attribute value(s) of the data samples are missing. The missing value problem occurs due to problems such as manual data entry procedures, incorrect measurements, equipment errors, and so on. As a result, this kind of incomplete datasets can lead to performance degradation for the data mining purpose. To solve this problem, the case deletion and missing value imputation can be considered. In this three-year project, the aim of the first year research is to review and survey related works of missing value imputation from 2000 to 2015 in order to figure out the limitations of related literatures. On the other hand, the applicability of using case deletion is also examined. That is, different types missing data (i.e. categorical, numerical, and mixed types) and different missing rates are studied. The second year research focuses on comparing statistical and supervised learning techniques for missing value imputation. In particular, six different algorithms will be compared. Finally, the aim of the third year research is to propose a hybrid learning based imputation method to improve the quality of missing value imputation.
Relation:	財團法人國家實驗研究院科技政策研究與資訊中心
Appears in Collections:	[Department of Information Management] Research Project

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	300	View/Open

社群 sharing

Loading...