dc.description.abstract | After the invention of Alpha Go, machine learning caught the public eye and showed us the essential need for data collection. Nevertheless, in reality, data collection is often uneven owing to its many difficulties and confinement. Feature selection and imbalanced (Sampling) have inherent impacts on Classifier in vector space. This in turn impacts the ability of learning and classification which also leads to difficulty and inaccuracy during data classification. This research aims to utilize data from public websites to design two processes to excavate imbalanced (Sampling), feature selection and place sampling in the beginning and at the end. It will utilize five examples of imbalanced (sampling); three examples of increased over sampling and two of reduced under sampling placed in the beginning and the back. Moreover, it will use two different models and utilize normalization with non-normalization in the two processes. Classifier in class imbalanced is often used to support vector machines and decision trees two model. From this research, we can find out that class imbalanced need use after then use feature selection, SMOTE is when low data amounts after sampling increase over sampling. Random is when high data amounts after sampling reduce under sampling. It is recommended to use PCA when feature selection is under 20 dimensions, as GA is recommended if feature selection is above 20 dimensions. Moreover, the ideal classifier is SVM. When it comes to the question of utilizing normalization in data, we can utilize classification to selection. decision tree abandons it. support vector machines use it. | en_US |