dc.description.abstract | Data privacy has been a critical issue in recent years. With the rise of data privacy issues, many regulations have been established to restrict data transmission and preservation. Therefore, gathering data from different institutions for model training has become challenging. To address this, privacy preserving methods such as federated learning and transfer earning have been proposed. In this research, we aim to explore the performance of these privacy preserving methods on Second Primary Cancer prediction in lung cancer survivors, using data from 8 hospitals. We compared the performances of localized learning, centralized learning, federated learning, and transfer learning. The results demonstrated that federated learning outperformed localized learning and achieved similar results to centralized learning. Besides, we proposed methods to mitigate the negative impact caused by the data heterogeneity issue in federated learning. The first method excluded the institutions with divergent data distribution, while the second method incorporated personalized models with customized learning rates and the personalized layer. Both methods demonstrate a better result compared to the federated learning baseline in most institutions. However, for the institutions that were excluded or exhibited severe divergence, transfer learning can be served as an alternative as its prominent performance. To sum up, our study suggests that the privacy preserving machine learning methods exhibit efficiency under strict data regulations and implement effective training strategies when addressing the data heterogeneity issues. | en_US |