利用可解釋性機器學習方法與基於反相蛋白陣列的多體學資料探究腎細胞癌亞型;Investigation of Renal Cell Carcinoma Subtypes Using Explainable Machine Learning Methods and Reverse Phase Protein Array-Based Multi-Omics Data

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/95841

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95841

題名:	利用可解釋性機器學習方法與基於反相蛋白陣列的多體學資料探究腎細胞癌亞型;Investigation of Renal Cell Carcinoma Subtypes Using Explainable Machine Learning Methods and Reverse Phase Protein Array-Based Multi-Omics Data
作者:	余嘉俊;Chun, Yu Ka
貢獻者:	資訊工程學系
關鍵詞:	癌症亞型分類;多體學;可解釋性機器學習;Cancer Subtype Classification;Multi-Omics;Interpretable Machine Learning
日期:	2024-08-21
上傳時間:	2024-10-09 17:19:38 (UTC+8)
出版者:	國立中央大學
摘要:	腎臟癌是全球公共衛生重大健康問題之一，每年新增病例超過40萬例，死亡人數約為18萬，其中，腎細胞癌（Renal Cell Carcinoma, RCC）占90%以上，主要包括嫌色性腎細胞癌（Kidney Chromophobe, KICH）、透明細胞型腎細胞癌（Kidney Renal Clear Cell Carcinoma, KIRC）和乳頭狀腎細胞癌（Kidney Renal Papillary Cell Carcinoma, KIRP）。由於每種亞型的預後和治療方式不同，準確區分不同亞型，並且了解不同亞型之異同，有利於精準醫療之發展。反相蛋白陣列（Reverse Phase Protein Array, RPPA）是一種高通量蛋白質體學技術，能夠在使用極少樣本的情況下定量分析多種蛋白質，具有高靈敏度、快速處理和檢測翻譯後修飾之優勢，若能深入分析相關數據，將可推展癌症研究和相關生物標誌物之量化。然而，目前採用基於反相蛋白陣列之多體學數據探討腎細胞癌亞型之研究仍較缺乏，而多體學數據之分析，有助於了解不同亞型之機制，推進治療之發展。因此，本研究探討反相蛋白陣列和多體學資料在腎細胞癌亞型之分類，採用決策樹（Decision Tree, DT）、隨機森林（Random Forest, RF）、支持向量機（Support Vector Machine, SVM）、k近鄰（K-Nearest Neighbors, KNN）和極限梯度提升（eXtreme Gradient Boosting, XGB）五種分類模型，於單一體學、基於反相蛋白陣列的雙體學、基於反相蛋白陣列的多體學評估反相蛋白陣列於腎癌亞型分類上的重要性。研究結果表明，反相蛋白陣列能有效顯著提高分類準確性。特別是，極限梯度提升模型使用突變體學和反相蛋白陣列資料時表現顯著地進一步提升了性能，顯示反相蛋白陣列在腎細胞癌亞型分類中的重要貢獻。此外，我們引入新的評估方法，包括調整加權準確性得分（Adjusted Weighted Accuracy Score, AW-ACC SCORE）以比較體學之間在特定任務上的關鍵性和調整加權絕對值Shapley重要性（Adjusted Weighted Mean Absolute Shapley Importance, AWMSHAP）以評估特徵重要性，這些方法識別出重要蛋白，如INPP4B、PIK3CA、NDRG1和CASP7，這些蛋白這可能與亞型分類有潛在的關聯，與不同的腎細胞癌亞型有顯著關聯，可能影響腫瘤的生物行為和臨床預後。本研究結果顯示結合反相蛋白陣列、多體學資料和機器學習具分類腎細胞癌亞型之潛力，識別出之重要蛋白顯示機器學習模型解釋性之重要，以建立臨床信任並促進研究成果的臨床轉化。;Renal cell carcinoma is a major global health issue, with over 400,000 new cases and 180,000 deaths annually. Renal cell carcinoma (RCC) accounts for over 90% of these cases, including chromophobe RCC (KICH), clear cell RCC (KIRC), and papillary RCC (KIRP). Each subtype has distinct prognoses and treatment methods; thus, accurately distinguishing between subtypes and understanding their differences is crucial for developing precision medicine. Reverse Phase Protein Array (RPPA) is a high-throughput proteomics technology that can quantitatively analyze multiple proteins using minimal sample amounts, offering high sensitivity, rapid processing, and the ability to detect post-translational modifications. In-depth analysis of RPPA data can advance cancer research and the quantification of related biomarkers. However, studies exploring RCC subtypes using multi-omics data based on RPPA are still lacking. Analyzing multi-omics data can enhance our understanding of subtype mechanisms and promote therapeutic development. This study investigated the classification of RCC subtypes using RPPA and multi-omics data. We employed five classification models—Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and eXtreme Gradient Boosting (XGB)—to evaluate the performance of RPPA and integrated multi-omics data. The results show that RPPA-based dual-omics, and RPPA-based multi-omics datasets. The results indicate that RPPA significantly enhances classification accuracy. Notably, the XGB model demonstrated substantial performance improvement when utilizing mutation and RPPA data, underscoring the critical role of RPPA in renal cell carcinoma subtype classification. Furthermore, we introduce novel evaluation methods, including the Adjusted Weighted Accuracy Score (AW-ACC SCORE) for comparing the importance of omics in specific tasks and the Adjusted Weighted Mean Absolute Shapley Importance (AWMSHAP) for assessing feature importance, identifying key proteins such as INPP4B, PIK3CA, NDRG1, and CASP7, which probably have a potential association with subtype classification. These proteins are associated with different subtypes and would influence tumor behavior and clinical outcomes. Our findings indicated the potential of combining RPPA, multi-omics data, and machine learning for precise RCC subtype classification. The identified significant proteins highlight the importance of explainability in machine learning models to build clinical trust and facilitate the translation of research findings into clinical practice.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	30	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....