dc.description.abstract | Imbalanced data is a common problem in many application domains of machine learning. How to solve the problem of misclassification of minority class samples by traditional classifiers has become a challenge in machine learning when some classes of training set have more samples, and some classes have relatively few samples. This paper proposes a new model combining Bayesian classifier and support vector machine (SVM) from the perspective of algorithm level, namely, SVM-rebalancing. In the learning process, the rebalance parameter (classification weight parameter) provides a coordination that balances the classification weight of each class. The problem is solved by rebalancing programming problem, so as to produce an effective identifiability for minority samples. The next study wants to understand whether the possible sources of misclassifications are not only the imbalance, but also other factors that cause to these misclassifications. In view of the purely predictive model of pattern recognition lacks visual understanding, black box methods such as neural networks and support vector machines cannot provide interpretable model, which makes it impossible to explore the sources of misclassification causes. Therefore, this study further proposes a pre-processing of multidimensional scaling of kernel functions to construct a visual low-dimensional data representation space. In practice, the visual recognition model indicates that the overlapping distribution, multimodal distribution, and skewed distribution of the data in the database are also other causes of poor classification performance of the classifier. Finally, this research gives a suggestion that using such a visual identification model strategy can tell us the problems that arise in the data structure, and once we further want to improve the performance of the classifier, we can make subsequent improvements in this aspect. | en_US |