Privacy-Preserving Machine Learning for Predicting Second Primary Cancer in the Context of Data Heterogeneity

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/92642

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92642

題名:	Privacy-Preserving Machine Learning for Predicting Second Primary Cancer in the Context of Data Heterogeneity
作者:	洪睿甫;Hong, Jui-Fu
貢獻者:	資訊管理學系
關鍵詞:	隱私保護;機器學習;聯邦學習;遷移學習;二次原發性癌症;肺癌;privacy preserving;machine learning;federated learning;transfer learning;second primary cancer;lung cancer
日期:	2023-07-26
上傳時間:	2023-10-04 16:07:24 (UTC+8)
出版者:	國立中央大學
摘要:	保護資料隱私在近幾年來一直是個關鍵的議題。隨著保護資料隱私的議題的興起，許多法規被制定並用來限制資料的傳輸和保存。因此，收集不同機構的資料並進行模型訓練成為一項具有挑戰性的任務。為了解決這個問題，許多研究提出了聯邦學習和遷移學習等隱私保護方法。在我們的研究中，我們使用來自 8 家醫院的癌症登記數據來探討這些隱私保護方法在肺癌倖存者第二原發性癌症預測中的表現。我們比較了本地化學習、集中式學習、聯邦學習和遷移學習的預測效能。結果顯示，聯邦學習在多數機構表現優於本地化學習，並取得了與集中式學習相似的結果。此外，我們提出了一些方法處理聯邦學習中數據異質性問題所帶來的負面影響。第一種方法排除了數據分佈差異過大的機構，而第二種方法則結合了個性化學習率和模型層數的個性化模型方法。和大多數機構的聯邦學習的基準結果相比，這兩種方法都改善預測效能。然而，對於那些被排除在外或表現出嚴重資料偏移的機構，可以看到這些機構使用遷移學習訓練模型後有較好的預測校能所以我們可使用遷移式學習作為替代方案。綜上所述，我們的研究結果顯示隱私保護的機器學習方法在嚴格的數據法規下能達到與集中資料訓練模型相似的成效，且能夠使用有效的訓練策略來解決機構間資料的異質性性問題。;Data privacy has been a critical issue in recent years. With the rise of data privacy issues, many regulations have been established to restrict data transmission and preservation. Therefore, gathering data from different institutions for model training has become challenging. To address this, privacy preserving methods such as federated learning and transfer earning have been proposed. In this research, we aim to explore the performance of these privacy preserving methods on Second Primary Cancer prediction in lung cancer survivors, using data from 8 hospitals. We compared the performances of localized learning, centralized learning, federated learning, and transfer learning. The results demonstrated that federated learning outperformed localized learning and achieved similar results to centralized learning. Besides, we proposed methods to mitigate the negative impact caused by the data heterogeneity issue in federated learning. The first method excluded the institutions with divergent data distribution, while the second method incorporated personalized models with customized learning rates and the personalized layer. Both methods demonstrate a better result compared to the federated learning baseline in most institutions. However, for the institutions that were excluded or exhibited severe divergence, transfer learning can be served as an alternative as its prominent performance. To sum up, our study suggests that the privacy preserving machine learning methods exhibit efficiency under strict data regulations and implement effective training strategies when addressing the data heterogeneity issues.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	74	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....