遺漏值(Missing Value)是造成資料不完整的一項原因,而資料遺漏的原因可能來自人為的資料輸入錯 誤、隱瞒或背景差異等主觀影響所造成的缺失;亦可能來自機器本身,如:儲存失敗、硬體故障、毁損 等導致特定時段内的資料遺漏等。因此,在進行資料探勘時遺漏值的問題往往導致了探勘效能的降 低。針對遺漏值的處理方式可分為直接刪除法以及遺漏值填補法。本研究計晝之第一年研究目的主要 在於收集與檢視從2000至今所發表的相關文獻(共超過一百篇論文)進行探討以發現目前填補遺漏值 的限制,另一方面將試著瞭解使用直接刪除法之最佳時機(例如於何種資料類型以及多少遺漏率等 等)。而第二年的研究目的將著重在統計與監督式學習演算法於填補遺漏值的效能比較,其中將包含 六種不同的演算法。最後一年的研究目的將嘗試推出一個混合式學習的遺漏值填補法以提昇填補遺漏 值的品質。 ;Incomplete datasets are usually caused by missing values. That is, some attribute value(s) of the data samples are missing. The missing value problem occurs due to problems such as manual data entry procedures, incorrect measurements, equipment errors, and so on. As a result, this kind of incomplete datasets can lead to performance degradation for the data mining purpose. To solve this problem, the case deletion and missing value imputation can be considered. In this three-year project, the aim of the first year research is to review and survey related works of missing value imputation from 2000 to 2015 in order to figure out the limitations of related literatures. On the other hand, the applicability of using case deletion is also examined. That is, different types missing data (i.e. categorical, numerical, and mixed types) and different missing rates are studied. The second year research focuses on comparing statistical and supervised learning techniques for missing value imputation. In particular, six different algorithms will be compared. Finally, the aim of the third year research is to propose a hybrid learning based imputation method to improve the quality of missing value imputation.