dc.description.abstract | Imbalanced data sets are a very important part of practical data analysis, such as credit card fraud, medical diagnosis classification and network attack. Faced with imbalanced data sets, we can adopt different data processing or use different classification methods to achieve better classification results. This paper attempts to use the one-class classification methods to classify two classification problems in imbalanced data sets, such as the one-class SVM, Isolated Forest and Local Outlier Factor. To further explore the case of missing data, by simulating missing values of 10% to 50% and using methods such as CART to impute the data, increase the classification accuracy. At the same time, Instance selection methods such as IB3, DROP3, and GA are also adopted for the imbalanced data. Hope to reduce impurities in the data set and reduce the time to train the model cost and find sufficient information
Discuss the missing value filling and one-class classification methods and which instance selection methods will improve the accuracy. Simulate missing value and instance selection methods and the order of filling. After the above experimental process and results, it can be found that when missing value is filled classification accuracy is close to classification accuracy; through the instance selection methods, the classification accuracy can be increased and the reduction rate is found to directly affect the classification correct rate; finally, the missing value and combination of selection methods, it can be found the process of separating the incomplete data from the complete data can improve the classification accuracy. However, when the stable accuracy is selected, using the complete data to simulate the missing values and filling and uses the instance selection methods will have good performance. | en_US |