dc.description.abstract | With the advancement of science and technology, people’s diets and lifestyles have also changed, and consequently, the diseases they suffer from have also changed. In Taiwan, the number of people who died of cancer in 1990 was 18,536. By 2020, it has been Increased to 50,161 people, an overall increase of 2.7 times. Among them, the number of deaths due to breast cancer increased from 619 to 2,655, reaching 4.29 times, which is much higher than the overall cancer death rate. However, this situation can be improved. The survival rate of breast cancer in early treatment (stage 0 and 1) can reach more than 95%, showing the importance of early detection and early treatment. If accurate analysis data of breast cancer can be provided for medical staff’s reference, medical staff can Determine the disease and give appropriate treatment to improve the survival rate of breast cancer patients.
This study proposes a set of data multi-preprocessing and algorithms for breast cancer data analysis and prediction methods, By using normalization, discretization, and Synthetic Minority Over-sampling Technique(SMOTE) preprocessing, and then perform support vector machine, K-nearest neighbor, decision tree , and random forest algorithm were used to construct a five-fold cross-validation prediction model, and compared with the model constructed by the corresponding single pre-processing to observe the impact on the prediction model in the case of the interaction of multiple pre-processing.
In this study, KDD′s X-ray image large data set and UCI′s fine needle aspiration (FNA) image small data set were used for experiments. By using different data preprocessing at the same time, and using algorithms for model construction, the experiment found that. In each prediction model, the normalized SMOTE pre-processing has a better effect on the AUC improvement than the individual pre-processing. Among them, the AUC improved by the support vector machine is the highest. From the experiments of this research, it is known that when the support vector machine performs the prediction of the X-ray image and the data set with severe class imbalance, the normalized SMOTE data pre-processing can obtain the model with better prediction value, fine needle aspiration (FNA) Images and slightly class-imbalanced datasets, after regularized SMOTE, have improved, but the impact is small. | en_US |