Privacy-Preserving Machine Learning for Predicting Second Primary Cancer in the Context of Data Heterogeneity

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：27

、訪客IP：18.118.99.252

姓名

洪睿甫(Jui-Fu Hong) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

(Privacy-Preserving Machine Learning for Predicting Second Primary Cancer in the Context of Data Heterogeneity)

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-7-23以後開放)

摘要(中)

保護資料隱私在近幾年來一直是個關鍵的議題。隨著保護資料隱私的議題的興起，許多法規被制定並用來限制資料的傳輸和保存。因此，收集不同機構的資料並進行模型訓練成為一項具有挑戰性的任務。為了解決這個問題，許多研究提出了聯邦學習和遷移學習等隱私保護方法。在我們的研究中，我們使用來自 8 家醫院的癌症登記數據來探討這些隱私保護方法在肺癌倖存者第二原發性癌症預測中的表現。我們比較了本地化學習、集中式學習、聯邦學習和遷移學習的預測效能。結果顯示，聯邦學習在多數機構表現優於本地化學習，並取得了與集中式學習相似的結果。此外，我們提出了一些方法處理聯邦學習中數據異質性問題所帶來的負面影響。第一種方法排除了數據分佈差異過大的機構，而第二種方法則結合了個性化學習率和模型層數的個性化模型方法。和大多數機構的聯邦學習的基準結果相比，這兩種方法都改善預測效能。然而，對於那些被排除在外或表現出嚴重資料偏移的機構，可以看到這些機構使用遷移學習訓練模型後有較好的預測校能所以我們可使用遷移式學習作為替代方案。綜上所述，我們的研究結果顯示隱私保護的機器學習方法在嚴格的數據法規下能達到與集中資料訓練模型相似的成效，且能夠使用有效的訓練策略來解決機構間資料的異質性性問題。

摘要(英)

Data privacy has been a critical issue in recent years. With the rise of data privacy issues, many regulations have been established to restrict data transmission and preservation. Therefore, gathering data from different institutions for model training has become challenging. To address this, privacy preserving methods such as federated learning and transfer earning have been proposed. In this research, we aim to explore the performance of these privacy preserving methods on Second Primary Cancer prediction in lung cancer survivors, using data from 8 hospitals. We compared the performances of localized learning, centralized learning, federated learning, and transfer learning. The results demonstrated that federated learning outperformed localized learning and achieved similar results to centralized learning. Besides, we proposed methods to mitigate the negative impact caused by the data heterogeneity issue in federated learning. The first method excluded the institutions with divergent data distribution, while the second method incorporated personalized models with customized learning rates and the personalized layer. Both methods demonstrate a better result compared to the federated learning baseline in most institutions. However, for the institutions that were excluded or exhibited severe divergence, transfer learning can be served as an alternative as its prominent performance. To sum up, our study suggests that the privacy preserving machine learning methods exhibit efficiency under strict data regulations and implement effective training strategies when addressing the data heterogeneity issues.

關鍵字(中)

★ 隱私保護
★ 機器學習
★ 聯邦學習
★ 遷移學習
★ 二次原發性癌症
★ 肺癌

關鍵字(英)

★ privacy preserving
★ machine learning
★ federated learning
★ transfer learning
★ second primary cancer
★ lung cancer

論文目次

Abstract -i
論文摘要 -ii
Table of Content -iii
List of Figure -v
List of Table -vi
1. Introduction -1
1.1 Research Background -1
1.1.1 Privacy Preserving -1
1.1.2 Data Heterogeneity Issues -2
1.1.3 Federated Learning and Transfer Learning -2
1.1.4 Cancer Registry -3
1.1.5 Lung Cancer and Second Primary Cancer -4
1.2 Research Motivation -4
1.3 Research Objectives -5
2. Related Works -6
2.1 Machine Learning in Second Primary Cancer Prediction -6
2.2 Federated Learning -6
2.3 Transfer Learning -8
3. Methods -10
3.1 Data Sources and Clinical Settings -10
3.2 Dataset and Preprocess -10
3.3 Model Training Strategy -12
3.3.1 Federated Learning Methods -12
3.3.2 Transfer Learning Methods -14
3.3.3 Traditional Learning Methods -14
3.3.4 Compare Different Machine Learning Methods -14
3.4 Model Architecture -15
3.5 Data Heterogeneous Analysis -16
3.6 Model Personalization -17
3.7 Statistical and Hypothesis Testing -18
4. Results -19
4.1 Data Cleaning -19
4.2 Data Characteristics -19
4.3 Comparison of Encoding Methods -20
4.4 Federated Learning Analysis -23
4.5 Data Heterogeneity Analysis -24
4.6 Model Personalization -28
5. Discussion and Limitation -31
6. Conclusion -34
Reference -35
Appendix -40

參考文獻

[1] R. Xu, N. Baracaldo, and J. Joshi, “Privacy-Preserving Machine Learning: Methods, Challenges and Directions.” arXiv, Sep. 22, 2021. Accessed: Feb. 05, 2023. [Online]. Available: http://arxiv.org/abs/2108.04417
[2] S. Hardy et al., “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption.” arXiv, Nov. 28, 2017. doi: 10.48550/arXiv.1711.10677.
[3] M. Abadi et al., “Deep Learning with Differential Privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna Austria: ACM, Oct. 2016, pp. 308–318. doi: 10.1145/2976749.2978318.
[4] J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang, “Federated Learning for Healthcare Informatics,” ArXiv191106270 Cs, Aug. 2020, Accessed: Dec. 10, 2021. [Online]. Available: http://arxiv.org/abs/1911.06270
[5] X. Cheng, F. Shi, Y. Liu, X. Liu, and L. Huang, “Wind turbine blade icing detection: a federated learning approach,” Energy, vol. 254, p. 124441, Sep. 2022, doi: 10.1016/j.energy.2022.124441.
[6] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy Efficient Federated Learning Over Wireless Communication Networks,” IEEE Trans. Wirel. Commun., vol. 20, no. 3, pp. 1935–1949, Mar. 2021, doi: 10.1109/TWC.2020.3037554.
[7] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated Learning: Challenges, Methods, and Future Directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, May 2020, doi: 10.1109/MSP.2020.2975749.
[8] X. Bai et al., “Advancing COVID-19 diagnosis with privacy-preserving collaboration in artificial intelligence,” Nat. Mach. Intell., vol. 3, no. 12, Art. no. 12, Dec. 2021, doi: 10.1038/s42256-021-00421-z.
[9] A. Z. Tan, H. Yu, L. Cui, and Q. Yang, “Towards Personalized Federated Learning,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–17, 2022, doi: 10.1109/TNNLS.2022.3160699.
[10] Y. Huang et al., “Personalized Cross-Silo Federated Learning on Non-IID Data.” arXiv, Dec. 13, 2021. Accessed: Jun. 16, 2023. [Online]. Available: http://arxiv.org/abs/2007.03797
[11] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated Optimization in Heterogeneous Networks.” arXiv, Apr. 21, 2020. Accessed: Jun. 05, 2022. [Online]. Available: http://arxiv.org/abs/1812.06127
[12] X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou, “FedBN: Federated Learning on Non-IID Features via Local Batch Normalization,” ArXiv210207623 Cs, May 2021, Accessed: Apr. 07, 2022. [Online]. Available: http://arxiv.org/abs/2102.07623
[13] M. G. Arivazhagan, V. Aggarwal, A. K. Singh, and S. Choudhary, “Federated Learning with Personalization Layers,” Dec. 2019, doi: 10.48550/arXiv.1912.00818.
[14] J. Zhang, S. Guo, X. Ma, H. Wang, W. Xu, and F. Wu, “Parameterized Knowledge Transfer for Personalized Federated Learning,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2021, pp. 10092–10104. Accessed: Dec. 15, 2022. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/5383c7318a3158b9bc261d0b6996f7c2-Abstract.html
[15] Y. J. Cho, J. Wang, and G. Joshi, “Towards Understanding Biased Client Selection in Federated Learning,” in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR, May 2022, pp. 10351–10375. Accessed: Mar. 19, 2023. [Online]. Available: https://proceedings.mlr.press/v151/jee-cho22a.html
[16] X. Wang, W. Chen, J. Xia, Z. Wen, R. Zhu, and T. Schreck, “HetVis: A Visual Analysis Approach for Identifying Data Heterogeneity in Horizontal Federated Learning,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 01, pp. 310–319, Jan. 2023, doi: 10.1109/TVCG.2022.3209347.
[17] J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated Learning: Strategies for Improving Communication Efficiency,” arXiv, arXiv:1610.05492, Oct. 2017. doi: 10.48550/arXiv.1610.05492.
[18] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010, doi: 10.1109/TKDE.2009.191.
[19] “A Comprehensive Survey on Transfer Learning | IEEE Journals & Magazine | IEEE Xplore.” https://ieeexplore.ieee.org/abstract/document/9134370 (accessed Feb. 12, 2023).
[20] C.-W. Kao et al., “Accuracy of long-form data in the Taiwan cancer registry,” J. Formos. Med. Assoc., vol. 120, no. 11, pp. 2037–2041, Nov. 2021, doi: 10.1016/j.jfma.2021.04.022.
[21] C.-J. Chiang, S.-L. You, C.-J. Chen, Y.-W. Yang, W.-C. Lo, and M.-S. Lai, “Quality assessment and improvement of nationwide cancer registration system in Taiwan: a review,” Jpn. J. Clin. Oncol., vol. 45, no. 3, pp. 291–296, Mar. 2015, doi: 10.1093/jjco/hyu211.
[22] 衛生福利部國民健康署 , “衛生福利部國民健康署 ,” 衛生福利部國民健康署 , Dec. 31, 2016. https://www.hpa.gov.tw/Home/Index.aspx (accessed Mar. 12, 2023).
[23] “Cancer today.” http://gco.iarc.fr/today/home (accessed Mar. 12, 2023).
[24] C. G. N. Demandante, D. A. Troyer, and T. P. Miles, “Multiple Primary Malignant Neoplasms: Case Report and a Comprehensive Review of the Literature,” Am. J. Clin. Oncol., vol. 26, no. 1, p. 79, Feb. 2003.
[25] “Definition of second primary cancer - NCI Dictionary of Cancer Terms - NCI,” Feb. 02, 2011. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/second-primary-cancer (accessed Mar. 11, 2023).
[26] “Multiple primary malignant tumors. A survey of the literature and a statistical study | Semantic Scholar.” https://www.semanticscholar.org/paper/Multiple-primary-malignant-tumors.-A-survey-of-the-Warren/db002e714d10e5dd14b81934601ddfbe2697c060 (accessed Mar. 11, 2023).
[27] L. B. Travis, “The Epidemiology of Second Primary Cancers,” Cancer Epidemiol. Biomarkers Prev., vol. 15, no. 11, pp. 2020–2026, Nov. 2006, doi: 10.1158/1055-9965.EPI-06-0414.
[28] I. Kononenko, “Machine learning for medical diagnosis: history, state of the art and perspective,” Artif. Intell. Med., vol. 23, no. 1, pp. 89–109, Aug. 2001, doi: 10.1016/S0933-3657(01)00077-X.
[29] S. Dash, S. K. Shakyawar, M. Sharma, and S. Kaushik, “Big data in healthcare: management, analysis and future prospects,” J. Big Data, vol. 6, no. 1, p. 54, Jun. 2019, doi: 10.1186/s40537-019-0217-0.
[30] R. S. Antunes, C. André da Costa, A. Küderle, I. A. Yari, and B. Eskofier, “Federated Learning for Healthcare: Systematic Review and Architecture Proposal,” ACM Trans. Intell. Syst. Technol., vol. 13, no. 4, p. 54:1-54:23, May 2022, doi: 10.1145/3501813.
[31] Y. Kumar and R. Singla, “Federated Learning Systems for Healthcare: Perspective and Recent Progress,” in Federated Learning Systems: Towards Next-Generation AI, M. H. ur Rehman and M. M. Gaber, Eds., in Studies in Computational Intelligence. Cham: Springer International Publishing, 2021, pp. 141–156. doi: 10.1007/978-3-030-70604-3_6.
[32] N. Mehta and A. Pandit, “Concurrence of big data analytics and healthcare: A systematic review,” Int. J. Med. Inf., vol. 114, pp. 57–65, Jun. 2018, doi: 10.1016/j.ijmedinf.2018.03.013.
[33] S. Hindocha et al., “A comparison of machine learning methods for predicting recurrence and death after curative-intent radiotherapy for non-small cell lung cancer: Development and validation of multivariable clinical prediction models,” eBioMedicine, vol. 77, p. 103911, Mar. 2022, doi: 10.1016/j.ebiom.2022.103911.
[34] P. Liu, K. Jin, Y. Jiao, M. He, and S. Fei, “Prediction of Second Primary Lung Cancer Patient’s Survivability Based on Improved Eigenvector Centrality-Based Feature Selection,” IEEE Access, vol. 9, pp. 55663–55672, 2021, doi: 10.1109/ACCESS.2021.3063944.
[35] C.-C. Chang and S.-H. Chen, “Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Breast Cancer Survivors,” Front. Genet., vol. 10, 2019, Accessed: Jun. 06, 2022. [Online]. Available: https://www.frontiersin.org/article/10.3389/fgene.2019.00848
[36] C.-C. Chang et al., “Developing a Stacked Ensemble-Based Classification Scheme to Predict Second Primary Cancers in Head and Neck Cancer Survivors,” Int. J. Environ. Res. Public. Health, vol. 18, no. 23, p. 12499, Nov. 2021, doi: 10.3390/ijerph182312499.
[37] I. Dayan et al., “Federated learning for predicting clinical outcomes in patients with COVID-19,” Nat. Med., vol. 27, no. 10, pp. 1735–1743, Oct. 2021, doi: 10.1038/s41591-021-01506-3.
[38] M. J. Sheller et al., “Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data,” Sci. Rep., vol. 10, no. 1, p. 12598, Jul. 2020, doi: 10.1038/s41598-020-69250-1.
[39] S. Minaee, R. Kafieh, M. Sonka, S. Yazdani, and G. Jamalipour Soufi, “Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning,” Med. Image Anal., vol. 65, p. 101794, Oct. 2020, doi: 10.1016/j.media.2020.101794.
[40] S. Basu, S. Mitra, and N. Saha, “Deep Learning for Screening COVID-19 using Chest X-Ray Images,” in 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Feb. 2020, pp. 2521–2527. doi: 10.1109/SSCI47803.2020.9308571.
[41] V. V. Danilov et al., “Automatic scoring of COVID-19 severity in X-ray imaging based on a novel deep learning workflow,” Sci. Rep., vol. 12, no. 1, Art. no. 1, Jul. 2022, doi: 10.1038/s41598-022-15013-z.
[42] Y. Chen, X. Qin, J. Wang, C. Yu, and W. Gao, “FedHealth: A Federated Transfer Learning Framework for Wearable Healthcare,” IEEE Intell. Syst., vol. 35, no. 4, pp. 83–93, Jul. 2020, doi: 10.1109/MIS.2020.2988604.
[43] Y. Chen, W. Lu, J. Wang, and X. Qin, “FedHealth 2: Weighted Federated Transfer Learning via Batch Normalization for Personalized Healthcare,” Jun. 2021, doi: 10.48550/arXiv.2106.01009.
[44] 蔡昌赫 and Cai C.-H., “整合聚類與分類機器學習方法建立原發性肺癌二次癌症預測
模型 ;Integrating Clustering and Classification Machine Learning Methods to Build a Second Primary Cancer Prediction Model in Lung Cancer Survivors,” thesis, 國立中央大
學 , 2022. Accessed: Mar. 18, 2023. [Online]. Available: http://ir.lib.ncu.edu.tw/handle/987654321/89802#.ZBVnSXZBxD8
[45] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” ArXiv160205629 Cs, Feb. 2017, Accessed: Mar. 10, 2022. [Online]. Available: http://arxiv.org/abs/1602.05629
[46] D. J. Beutel et al., “Flower: A Friendly Federated Learning Research Framework,” arXiv, arXiv:2007.14390, Mar. 2022. doi: 10.48550/arXiv.2007.14390.
[47] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” arXiv, arXiv:1708.02002, Feb. 2018. doi: 10.48550/arXiv.1708.02002.
[48] S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Accessed: Mar. 12, 2023. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
[49] T. K. Dang, K. C. Tan, M. Choo, N. Lim, J. Weng, and M. Feng, “Building ICU In-hospital Mortality Prediction Model with Federated Learning,” in Federated Learning: Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, Eds., in Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020, pp. 255–268. doi: 10.1007/978-3-030-63076-8_18.

指導教授

許智誠曾意儒

審核日期

2023-7-26

推文