Privacy-Preserving Machine Learning for Predicting Second Primary Cancer in the Context of Data Heterogeneity

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	洪睿甫	zh_TW
DC.creator	Jui-Fu Hong	en_US
dc.date.accessioned	2023-7-26T07:39:07Z
dc.date.available	2023-7-26T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110423026
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	保護資料隱私在近幾年來一直是個關鍵的議題。隨著保護資料隱私的議題的興起，許多法規被制定並用來限制資料的傳輸和保存。因此，收集不同機構的資料並進行模型訓練成為一項具有挑戰性的任務。為了解決這個問題，許多研究提出了聯邦學習和遷移學習等隱私保護方法。在我們的研究中，我們使用來自 8 家醫院的癌症登記數據來探討這些隱私保護方法在肺癌倖存者第二原發性癌症預測中的表現。我們比較了本地化學習、集中式學習、聯邦學習和遷移學習的預測效能。結果顯示，聯邦學習在多數機構表現優於本地化學習，並取得了與集中式學習相似的結果。此外，我們提出了一些方法處理聯邦學習中數據異質性問題所帶來的負面影響。第一種方法排除了數據分佈差異過大的機構，而第二種方法則結合了個性化學習率和模型層數的個性化模型方法。和大多數機構的聯邦學習的基準結果相比，這兩種方法都改善預測效能。然而，對於那些被排除在外或表現出嚴重資料偏移的機構，可以看到這些機構使用遷移學習訓練模型後有較好的預測校能所以我們可使用遷移式學習作為替代方案。綜上所述，我們的研究結果顯示隱私保護的機器學習方法在嚴格的數據法規下能達到與集中資料訓練模型相似的成效，且能夠使用有效的訓練策略來解決機構間資料的異質性性問題。	zh_TW
dc.description.abstract	Data privacy has been a critical issue in recent years. With the rise of data privacy issues, many regulations have been established to restrict data transmission and preservation. Therefore, gathering data from different institutions for model training has become challenging. To address this, privacy preserving methods such as federated learning and transfer earning have been proposed. In this research, we aim to explore the performance of these privacy preserving methods on Second Primary Cancer prediction in lung cancer survivors, using data from 8 hospitals. We compared the performances of localized learning, centralized learning, federated learning, and transfer learning. The results demonstrated that federated learning outperformed localized learning and achieved similar results to centralized learning. Besides, we proposed methods to mitigate the negative impact caused by the data heterogeneity issue in federated learning. The first method excluded the institutions with divergent data distribution, while the second method incorporated personalized models with customized learning rates and the personalized layer. Both methods demonstrate a better result compared to the federated learning baseline in most institutions. However, for the institutions that were excluded or exhibited severe divergence, transfer learning can be served as an alternative as its prominent performance. To sum up, our study suggests that the privacy preserving machine learning methods exhibit efficiency under strict data regulations and implement effective training strategies when addressing the data heterogeneity issues.	en_US
DC.subject	隱私保護	zh_TW
DC.subject	機器學習	zh_TW
DC.subject	聯邦學習	zh_TW
DC.subject	遷移學習	zh_TW
DC.subject	二次原發性癌症	zh_TW
DC.subject	肺癌	zh_TW
DC.subject	privacy preserving	en_US
DC.subject	machine learning	en_US
DC.subject	federated learning	en_US
DC.subject	transfer learning	en_US
DC.subject	second primary cancer	en_US
DC.subject	lung cancer	en_US
DC.title	Privacy-Preserving Machine Learning for Predicting Second Primary Cancer in the Context of Data Heterogeneity	en_US
dc.language.iso	en_US	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110423026 完整後設資料紀錄