基於機器學習的基因變異與乳癌治療副作用關聯性分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：115

、訪客IP：18.188.123.174

姓名

蘇煜程(Yu-Cheng Su) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於機器學習的基因變異與乳癌治療副作用關聯性分析
(The analysis of the association between genetic variations and side effects of breast cancer treatment based on machine learning)

相關論文

★ 基於深度學習之皮膚病兆切割之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

機器學習作為一項強大高效的工具，已經被廣泛應用在多個專業領域上，在大數據時代中顯著提高了各項任務的準確性以及執行效率。而在擁有大量樣本需要處理的生物醫學中，更是擁有著巨大潛力。本研究聚焦於使用基於機器學習的一系列特徵選擇方法，探究個體基因位點變異與乳癌治療藥物的四種副作用(骨質疏鬆、周邊神經病變、白血球數量異常與子宮內膜厚度)的關係。與探索特定副作用與基因位點之間關係的主流研究不同，我們的研究著重於揭示藥物副作用與基因變異位點之間的關聯性，希望確保患者的安全並選擇更為恰當的治療方法。

通過多階段分析，我們篩選出了與四種副作用各自高度相關的基因變異位點。在統計學上，我們比較了多種特徵選擇方法(卡方檢驗、費雪精確測試、斯皮爾曼等級相關係數、肯德爾相關係數，並使用p < 0.05作為閾值)進行第一階段的特徵選擇，接著評估了不同的機器學習分類器(隨機森林、XGBoost、深度神經網路)的差異作為第二階段的特徵選擇器。接著對各個特徵選擇器的分類準確率以及學習曲線進行深入比較，探討特徵重要性及基因位點對預測副作用有無的影響。

我們的研究針對每個副作用，將超過150,000個獨立的基因位點篩選至約100個關鍵位點，顯著提高了藥物副作用的預測準確性。此外，為了取得更詳細的驗證，我們針對與藥物代謝與免疫密切相關的HLA基因型進行分型分析，接著和機器學習模型篩選出的重要基因位點互相比較，從而提供了對藥物副作用機制更全面的理解。

摘要(英)

Machine learning is a powerful and efficient tool that has been widely applied across various professional fields, significantly enhancing the accuracy and execution efficiency of tasks in the era of big data. In the biomedical field, where large sample sizes need to be processed, machine learning demonstrates substantial potential. This study focuses on employing various machine learning-based feature selection methods to investigate the relation between individual gene locus variations and four side effects of breast cancer treatment: Osteoporosis, Peripheral Neuropathy, abnormal Endometrial Thickness and White Blood Cell Count. Unlike mainstream research that explores the relation between specific illnesses and genetic loci, our research focuses on unveiling the association between treatment side effects and genetic loci to ensure patient safety and select appropriate treatment methods.
Through multi-stage analysis, we identified genetic variant loci highly correlated with each of the four types of side effects. We compared various feature selection methods (Chi-Square, Fisher exact, Spearman’s Rank, Kendall Tou, using p value < 0.05 as the threshold) for the first stage of feature selection. Subsequently, we evaluated different machine learning classifiers (Random Forest, XGBoost, Neural Network) as the second stage of feature selectors. We conducted in-depth comparisons of the accuracy and learning curves of each feature selector, analyzing the importance of features and the impact of genetic loci on predicting side effects.
Our research narrowed down over 150,000 independent genetic loci to approximately 100 key loci for each side effect, significantly improving the accuracy of predicting medication side effects. To obtain further validation, we conducted genotype analysis on HLA, which is closely related to drug metabolism and immunity.

關鍵字(中)

★ 機器學習
★ 深度學習
★ 乳癌治療副作用
★ 基因變異

關鍵字(英)

★ Machine learning
★ Deep learning
★ Cancer Treatment Side Effects
★ Gene variations

論文目次

Chinese Abstract i
English Abstract ii
Table of Contents iii
List of Figures v
List of Tables vi
Chapter I. Introduction 1
Chapter II. Preliminary 4
2-1. Whole Exome Genome 4
2-2. Feature Selection Methods 5
2-2-1. Chi-Square Test 6
2-2-2. Fisher Exact Test 8
2-2-3. Spearman’s Correlation 9
2-2-4. Kendall’s Tau Coefficient 11
2-3. Machine Learning Models 12
2-3-1. Random Forest 12
2-3-2. XGBoost 14
2-3-3. Deep Learning 15
2-4. Genotype Analysis Toolkit -- HLA typing 16
Chapter III. Dataset Collection and Description 18
3-1. Dataset Description 18
3-1-1. Genetic Data Description 18
3-1-2. Clinical Data Description 19
3-2. Data preprocessing 21
Chapter IV. Experiments and Results 23
4-1. Statistic Methods 23
4-1-1. P value Calculation 23
4-1-2. False Positive Correction 25
4-2. Machine Learning Methods 26
4-2-1. Top Feature Performance Comparison 27
4-2-2. Neural Network Verification 30
4-2-3. Reverse Top n Features Verification 32
4-3. Advanced Feature Selection Methods 34
4-3-1. Change Point Detection 34
4-3-2. Relative Risk and Odds Ratio 36
4-4. Separated Zygosity Analysis 38
4-5. HLA typing 43
Chapter V. Conclusion 44
Chapter VI. Limitations and Future Work 45
Chapter VII. References 47

參考文獻

[1] E. M. Ferlay J, Lam F, Colombet M, Mery L, Piñeros M, et al., Global Cancer Observatory: Cancer Today, Lyon: International Agency for Research on Cancer, 2020. [Online]. Available: https://gco.iarc.fr/today.
[2] W. M. C. van den Boogaard, D. S. J. Komninos, and W. P. Vermeij, "Chemotherapy Side-Effects: Not All DNA Damage Is Equal," Cancers (Basel), vol. 14, no. 3, Jan 26 2022, doi: 10.3390/cancers14030627.
[3] A. Daniyal, I. Santoso, N. H. P. Gunawan, M. I. Barliana, and R. Abdulah, "Genetic Influences in Breast Cancer Drug Resistance," Breast Cancer (Dove Med Press), vol. 13, pp. 59-85, 2021, doi: 10.2147/BCTT.S284453.
[4] J. Jin et al., "Identification of Genetic Mutations in Cancer: Challenge and Opportunity in the New Era of Targeted Therapy," Front Oncol, vol. 9, p. 263, 2019, doi: 10.3389/fonc.2019.00263.
[5] B. Mansoori, A. Mohammadi, S. Davudian, S. Shirjang, and B. Baradaran, "The Different Mechanisms of Cancer Drug Resistance: A Brief Review," Adv Pharm Bull, vol. 7, no. 3, pp. 339-348, Sep 2017, doi: 10.15171/apb.2017.041.
[6] J. Turbiner et al., "Clinicopathological and molecular analysis of endometrial carcinoma associated with tamoxifen," Mod Pathol, vol. 21, no. 8, pp. 925-36, Aug 2008, doi: 10.1038/modpathol.2008.49.
[7] T. Hachisuga, H. Tsujioka, S. Horiuchi, T. Udou, M. Emoto, and T. Kawarabayashi, "K-ras mutation in the endometrium of tamoxifen-treated breast cancer patients, with a comparison of tamoxifen and toremifene," Br J Cancer, vol. 92, no. 6, pp. 1098-103, Mar 28 2005, doi: 10.1038/sj.bjc.6602456.
[8] K. J. Ryu et al., "Risk of Endometrial Polyps, Hyperplasia, Carcinoma, and Uterine Cancer After Tamoxifen Treatment in Premenopausal Women With Breast Cancer," JAMA Netw Open, vol. 5, no. 11, p. e2243951, Nov 1 2022, doi: 10.1001/jamanetworkopen.2022.43951.
[9] E. S. Smith et al., "Endometrial Cancers in BRCA1 or BRCA2 Germline Mutation Carriers: Assessment of Homologous Recombination DNA Repair Defects," JCO Precis Oncol, vol. 3, 2019, doi: 10.1200/PO.19.00103.
[10] Y. Jiang et al., "Effect and Safety of Therapeutic Regimens for Patients With Germline BRCA Mutation-Associated Breast Cancer: A Network Meta-Analysis," Front Oncol, vol. 11, p. 718761, 2021, doi: 10.3389/fonc.2021.718761.
[11] A. S. Krizhevsky, I.; Hinton, G.E., "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems, 2012.
[12] A. M. Graves, A.-r.; Hinton, G., "Speech Recognition with Deep Recurrent Neural Networks," presented at the Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013.
[13] A. V. L. J. N. S. N. P. A. N. G. I. P. J. U. Ł. Kaiser, "Attention Is All You Need," in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017: NIPS/NeurIPS (if applicable).
[14] J. Jumper et al., "Highly accurate protein structure prediction with AlphaFold," Nature, vol. 596, no. 7873, pp. 583-589, Aug 2021, doi: 10.1038/s41586-021-03819-2.
[15] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. I. Laurenson, "Dual Convolutional Neural Networks for Breast Mass Segmentation and Diagnosis in Mammography," IEEE Trans Med Imaging, vol. 41, no. 1, pp. 3-13, Jan 2022, doi: 10.1109/TMI.2021.3102622.
[16] M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, "Quantitative monitoring of gene expression patterns with a complementary DNA microarray," Science, vol. 270, no. 5235, pp. 467-70, Oct 20 1995, doi: 10.1126/science.270.5235.467.
[17] I. H. G. S. Consortium, "Initial sequencing and analysis of the human genome," Nature, vol. 409, pp. 860-921, 2001, doi: 10.1038/35057062.
[18] S. B. Ng et al., "Targeted capture and massively parallel sequencing of 12 human exomes," Nature, vol. 461, no. 7261, pp. 272-6, Sep 10 2009, doi: 10.1038/nature08250.
[19] M. G. Grabherr et al., "Full-length transcriptome assembly from RNA-Seq data without a reference genome," Nat Biotechnol, vol. 29, no. 7, pp. 644-52, May 15 2011, doi: 10.1038/nbt.1883.
[20] C. Kloypan, N. Koomdee, P. Satapornpong, T. Tempark, M. Biswas, and C. Sukasem, "A Comprehensive Review of HLA and Severe Cutaneous Adverse Drug Reactions: Implication for Clinical Pharmacogenomics and Precision Medicine," Pharmaceuticals (Basel), vol. 14, no. 11, Oct 25 2021, doi: 10.3390/ph14111077.
[21] L. Song, G. Bai, X. S. Liu, B. Li, and H. Li, "Efficient and accurate KIR and HLA genotyping with massively parallel sequencing data," Genome Res, vol. 33, no. 6, pp. 923-931, Jun 2023, doi: 10.1101/gr.277585.122.
[22] J. Dapprich et al., "The next generation of target capture technologies - large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity," BMC Genomics, vol. 17, p. 486, Jul 9 2016, doi: 10.1186/s12864-016-2836-6.
[23] L. O. Baumbusch et al., "Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors," BMC Genomics, vol. 9, p. 379, Aug 8 2008, doi: 10.1186/1471-2164-9-379.
[24] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
[25] T. a. G. Chen, C., "XGBoost: A Scalable Tree Boosting System," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. [Online]. Available: https://doi.org/10.1145/2939672.2939785.
[26] J. R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986, doi: 10.1007/BF00116251.
[27] C. O. Truong, L.; Vayatis, N., "Selective review of offline change point detection methods," Signal Processing, vol. 167, p. 107299, 2020.
[28] B. Luo, D. Yan, H. Yan, and J. Yuan, "Cytochrome P450: Implications for human breast cancer," Oncol Lett, vol. 22, no. 1, p. 548, Jul 2021, doi: 10.3892/ol.2021.12809.
[29] M. T. Ribeiro, S. Singh, and C. Guestrin, ""Why Should I Trust You?": Explaining the Predictions of Any Classifier," presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

指導教授

許藝瓊王家慶(Yi-Chiung Hsu Jia-Ching Wang)

審核日期

2024-7-29

推文