dc.description.abstract | With the evolution of Information Technology, people may easily collect various and large amounts of data. Consequently, data mining has widely considered in many industries. However, it is unavoidable that the collected data usually contain some missing values. If we do not deal with these missing data appropriately, the data mining results will be affected and the accuracies of learning models may be degraded. In related literature, missing value imputation by some statistical analyses and machine learning techniques has shown its applicability in solving incomplete data problems. However, very few studies examine the imputation performance of deep learning techniques. In addition, data discretization may further reduce the influence of outliers and increase the stability of models. Therefore, this thesis aims to compare the performances of various imputation models including deep neural networks based on Deep MultiLayer Perceptron (DMLP) and Deep Belief Network (DBN). Moreover, this thesis also examines the performances of different orders to combine data imputation and discretization. Particularly, Minimum Description Length Principle (MDLP) and ChiMerge (ChiM) are used as the discretizers.
The experimental results show that deep neural networks outperform the other imputation methods, especially for numeric and mixed datasets. For numeric datasets, the accuracies of DMLP and DBN are higher than the baseline by 14.70% and 15.88%, respectively, and 8.71% and 7.96% for mixed datasets. Furthermore, for the combinations of deep neural networks with data discretization by MDLP, no matter which combination order is conducted, the performances are higher than other combinations. Particularly, the classification accuracy rates of MDLP + DMLP and MDLP + DBN are slightly higher than using Imputation (DMLP) alone by 0.74% and 0.52%, respectively, and higher than the Baseline (ChiM) by 2.94% and 2.72%, respectively. Therefore, the experiment shows that the performance would be impacted by the chosen discretizer and deep learning algorithms. | en_US |