博碩士論文 102423002 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator李昀潔zh_TW
DC.creatorYun-Jie Lien_US
dc.date.accessioned2015-6-22T07:39:07Z
dc.date.available2015-6-22T07:39:07Z
dc.date.issued2015
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=102423002
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract遺漏值問題(Missing value problem)普遍存在資料探勘(Data mining)問 題之中,不論是資料輸入錯誤或者資料格式錯誤等問題,皆造成資料探勘建模時 無法有效利用現有的資料建立適合的分類模型。因此填補法(Imputation methods) 就針對此問題應運而生,此方法利用現有存在的資料進行分析並填補適合的值, 此適合的值可提供適當的資料供建模使用。 然而現有的資料或許無法提供有效的資料給填補法進行有效的補值,原因在 於現有的資料中有許多存在的問題,例如:雜訊資料存在的問題(Noisy problem)、 資料冗餘的問題(Redundancy)或存在許多不具代表性的資料(Represented instances)等,因此為了有效利用現有的資料進行補值,資料選取法(Instance selection methods)則利用篩選出具代表性的資料來解決上述之問題,換句話說, 資料選取法透過一系列的篩選標準來產生精簡資料集,此資料集為具代表性的資 料所組成,因此補值法就能利用此精簡資料集來進行補值,以避免原始資料內含 有的問題影響補值法的效果。 本論文為探討資料選取法對補值法的影響,透過 UCI 開放資料集庫中的 33 個資料集組成三種類型的資料集(類別型、混合型、數值型)來進行實驗,選定 三個資料選取法;IB3(Instance-based learning)、DROP3(Decremental Reduction Optimization Procedure)、GA(Genetic Algorithm),和三個補值法;KNNI (K-Nearest Neighbor Imputation method)、SVM(Support Vector Machine)、MLP (MultiLayers Perceptron),來檢驗何種情況下哪種組合方法(三個資料選取法配 上三個補值法)為最佳或最適合,或者是否組合方法是否比單純補值法更加有效 果。 依據本研究所得之結果,我們建議在數值型資枓集(Numerical datasets)情 況下資料選取法配上補值法的流程會比單純補值法的流程適合;資料選取法的部份,DROP3 則建議比較適合用在數值型與混合型資料集(Mixed datasets),但是 對於類別型資料集(Categorical datasets)且類別數大的情況下,則不建議使用資 料選取法 DROP3,另一方面,對於 GA 和 IB3 這兩個資料選取法,我們建議 GA 的方法會比 IB3 適合,因為依據本研究的實驗顯示,GA 的資料選取表現會比 IB3 來得穩定。 zh_TW
dc.description.abstractIn data mining, the collected datasets are usually incomplete, which contain some missing attribute values. It is difficult to effectively develop a learning model using the incomplete datasets. In literature, missing value imputation can be approached for the problem of incomplete datasets. Its aim is to provide estimations for the missing values by the (observed) complete data samples. However, some of the complete data may contain some noisy information, which can be regarded as outliers. If these noisy data were used for missing value imputation, the quality of the imputation results would be affected. To solve this problem, we propose to perform instance selection over the complete data before the imputation step. The aim of instance selection is to filter out some unrepresentative data from a given dataset. Therefore, this research focuses on examining the effect of performing instance selection on missing value imputation. The experimental setup is based on using 33 UCI datasets, which are composed of categorical, numerical, and mixed types of data. In addition, three instance selection methods, which are IB3 (Instance-based learning), DROP3 (Decremental Reduction Optimization Procedure), and GA (Genetic Algorithm) are used for comparison. Similarly, three imputation methods including KNNI (K-Nearest Neighbor Imputation method), SVM (Support Vector Machine), and MLP (MultiLayers Perceptron) are also employed individually. The comparative results can allow us to understand which combination of instance selection and imputation methods performs the best and whether combining instance selection and missing value imputation is the better choice than performing missing value imputation alone for the incomplete datasets. According to the results of this research, we suggest that the combinations of instance selection methods and imputation methods may suitable than the imputation methods along over numerical datasets. In particular, the DROP3 instance selection method is more suitable for numerical and mixed datasets, except for categorical datasets, especially when the number of features is large. For the other two instance selection methods, the GA method can provide more stable reduction performance than IB3. en_US
DC.subject資料探勘zh_TW
DC.subject資料選取法zh_TW
DC.subject補值法zh_TW
DC.subject機器學習zh_TW
DC.subject分類問題zh_TW
DC.subjectMachine Learningen_US
DC.subjectInstance Selection Methodsen_US
DC.subjectImputation Methodsen_US
DC.subjectClassificationen_US
DC.subjectData Miningen_US
DC.titleThe Effect of Instance Selection on Missing Value Imputationen_US
dc.language.isoen_USen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明