dc.description.abstract | The advancement of network makes the amount of data produced increasing rapidly. How to use the data effectively becomes very important, which makes the data mining technology more sophisticated. However, the problem of missing values has always been difficult to avoid in the collected data. Therefore, scholars have used statistical and machine learning methods to do a lot of researches for the imputation of missing values, and hope to reduce the impact of missing values on predictions, but there are few studies focusing on another type of solution by directly handling the datasets with missing values.
Therefore, this thesis proposes a novel approach based on the concept of sliding window and bounding box in deep learning, namely “Deep Learning Oriented Decision Tree”. In this approach, the dataset is divided into several subsets according to different window sizes, and each subset is used to build a decision tree, resulting in decision tree ensembles, and the final prediction result is based on the voting method. There are two experimental studies in this thesis. Study 1 is based on a comparison between Deep Learning Oriented Decision Tree and a single decision tree, and Study 2 for a comparison between Deep Learning Oriented Decision Tree and other missing value imputation methods. Moreover, the testing data with missing values are also considered in the two studies. According to the results of the experiment, the proposed approach performs the best in terms of classification accuracy over higher dimensional datasets. It is believed that such a contribution can help future researchers to deal with missing value problems more appropriately and efficiently, and to produce better performing prediction models.
| en_US |