dc.description.abstract | Lung cancer is the leading cause of cancer death in the world. In 2020, about 2.1 million people worldwide were diagnosed with lung cancer, and about 1.8 million died, accounting for 11.4% and 18% of the global total, respectively. The mortality rate is ranking first in the world for more serval years. With the improvement of cancer diagnostic tools and treatment methods, the survival rate of lung cancer has increased significantly. Correspondingly, the number of lung cancer survivors with second primary cancer after suffering from lung cancer has also increased significantly in the past decade. The assessment of second primary cancer risk has become an important topic. This study aims to use six machine learning algorithms including logistic regression, random forest, support vector machine, extreme gradient boosting, single-layer feed-forward neural network, and stacking model, using the cancer registry data from Chang Gung Memorial Hospital from 2004 to 2018. The registration data of cancer patients was used to establish a prediction model for second primary cancer of lung cancer survivors. The average AUROC of models trained by XGBoost reached 0.755 and the standard deviation was 0.037. The study also used an unsupervised cluster analysis method to analyze the heterogeneity of lung cancer patients, combining supervised and unsupervised machine learning methods in the form of adding unsupervised clustering analysis results as new features to supervised machine learning methods for training. The results showed that the predictive model combined with the cluster analysis method had no significant difference in performance compared with other models. After using the SHAP interpretation model to analyze the importance of features, we found that among the important feature factors, patients undergoing surgical resection of the primary site, the status of malignant pleural effusion is unknown, and the stage of integration stage I is positively correlated with the risk of second primary cancer of lung cancer, while the patient′s involvement in the visceral membrane or elastic layer is unknown, and the stage IV is negatively correlated with the risk of second primary cancer of lung cancer. Finally, a clinical decision support system for second primary cancer prediction of lung cancer was established based on the final prediction model deployed by the shiny suite in the R language for reference by physicians. | en_US |