dc.description.abstract | As technology improves day by day, many data that used to be ignored or was difficult to be gathered, could have a brand-new meaning nowadays no matter on personal or enterprise aspect. Data could be a tool to analyze market or a part of personal privacy, what’s more, it has become more valuable than the product itself. Thus, data mining, which means analyze data in many different ways, try to find out the correlation between each one of them and make use of them, is a big hit recently. It sounds easy but actually facing many difficulties while practicing. One of them is the incompleteness of data, means data that contains missing values.
Missing value will directly result in error of analysis outcome. Missing values may cause by human error or malfunctioning machine. For example: the process of saving data does not work well or broken hardware. So, outcomes of data mining and analyzing will often be interfered due to missing values.
Furthermore, there are continuous variables inside data, like: age. If for continuous variables, it could result in a narrow condition when data analyzing. Consequently, discretization is an important data preprocessing stage. Discretization will divide continuous variables into categorical by different cutting points and depends on different methods to reduce the influence of abnormal data or outliers. Because only high-quality data could output high quality outcomes.
There are many methods to deal with missing values and to implement discretization. This study will try to do discretization first and to inpute missing values first, and evaluated with accuracy to see which one is a better way. | en_US |