整合聚類與分類機器學習方法建立原發性肺癌二次癌症預測模型;Integrating Clustering and Classification Machine Learning Methods to Build a Second Primary Cancer Prediction Model in Lung Cancer Survivors

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/89802

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89802

題名:	整合聚類與分類機器學習方法建立原發性肺癌二次癌症預測模型;Integrating Clustering and Classification Machine Learning Methods to Build a Second Primary Cancer Prediction Model in Lung Cancer Survivors
作者:	蔡昌赫;Cai, Chang-He
貢獻者:	資訊管理學系
關鍵詞:	肺癌;二次癌症;聚類分析;機器學習;預測模型;Lung cancer;Second primary cancer;cluster analysis;machine learning;predictive model
日期:	2022-07-16
上傳時間:	2022-10-04 12:00:22 (UTC+8)
出版者:	國立中央大學
摘要:	肺癌爲全球癌症死亡佔比第一的癌症，2020 年全球已有約 210 萬人被診斷罹患肺癌，同年因癌症死亡人數約 180 萬，分別佔全球總人數的 11.4%與 18%，死亡率常年居世界首位。隨著癌症診斷工具與治療方式的改進，肺癌患者的存活時間顯著增長，相應地，肺癌患者在罹患肺癌後發生二次癌症的數量在近十年中也有明顯的增長趨勢，對肺癌患者發生二次癌症風險的評估成爲一項重要議題。本研究旨在使用包含邏輯斯迴歸、隨機森林、支持向量機、極限梯度提升、單層前饋神經網絡以及堆疊模型等六種機器學習演算法，使用 2004 至 2018年至長庚醫院就診的十大癌症患者之登記資料建立肺癌二次癌症預測模型，其中以極限梯度提升訓練的預測模型作爲最終模型，平均AUROC達到0.755，標準差爲0.037。研究還使用非監督式聚類分析方法對肺癌患者進行異質性分析，將非監督聚類分群結果作爲新特徵，與原有特徵整合後，以監督式機器學習方法進行模型。結果顯示結合聚類分群結果的預測模型與其他模型相比效能並無顯著差異。在運用SHAP解釋模型方法進行特徵重要性分析後，我們發現在重要特徵因子中，患者進行手術切除原發部位、惡性肋膜積水情況不詳、整併期別爲StageⅠ與發生肺癌二次癌症風險呈正相關，而患者波及臟層膜或彈性層情況不詳、整併期別爲StageⅣ則與肺癌二次癌症風險呈負相關。最後，研究基於 R 語言中的 shiny 套件部署最終預測模型建立了一個針對肺癌二次癌症預測的臨床決策支援系統，供醫師參考。;Lung cancer is the leading cause of cancer death in the world. In 2020, about 2.1 million people worldwide were diagnosed with lung cancer, and about 1.8 million died, accounting for 11.4% and 18% of the global total, respectively. The mortality rate is ranking first in the world for more serval years. With the improvement of cancer diagnostic tools and treatment methods, the survival rate of lung cancer has increased significantly. Correspondingly, the number of lung cancer survivors with second primary cancer after suffering from lung cancer has also increased significantly in the past decade. The assessment of second primary cancer risk has become an important topic. This study aims to use six machine learning algorithms including logistic regression, random forest, support vector machine, extreme gradient boosting, single-layer feed-forward neural network, and stacking model, using the cancer registry data from Chang Gung Memorial Hospital from 2004 to 2018. The registration data of cancer patients was used to establish a prediction model for second primary cancer of lung cancer survivors. The average AUROC of models trained by XGBoost reached 0.755 and the standard deviation was 0.037. The study also used an unsupervised cluster analysis method to analyze the heterogeneity of lung cancer patients, combining supervised and unsupervised machine learning methods in the form of adding unsupervised clustering analysis results as new features to supervised machine learning methods for training. The results showed that the predictive model combined with the cluster analysis method had no significant difference in performance compared with other models. After using the SHAP interpretation model to analyze the importance of features, we found that among the important feature factors, patients undergoing surgical resection of the primary site, the status of malignant pleural effusion is unknown, and the stage of integration stage I is positively correlated with the risk of second primary cancer of lung cancer, while the patient′s involvement in the visceral membrane or elastic layer is unknown, and the stage IV is negatively correlated with the risk of second primary cancer of lung cancer. Finally, a clinical decision support system for second primary cancer prediction of lung cancer was established based on the final prediction model deployed by the shiny suite in the R language for reference by physicians.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	48	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....