中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/89802
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 41268864      Online Users : 177
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/89802


    Title: 整合聚類與分類機器學習方法建立原發性肺癌二次癌症預測模型;Integrating Clustering and Classification Machine Learning Methods to Build a Second Primary Cancer Prediction Model in Lung Cancer Survivors
    Authors: 蔡昌赫;Cai, Chang-He
    Contributors: 資訊管理學系
    Keywords: 肺癌;二次癌症;聚類分析;機器學習;預測模型;Lung cancer;Second primary cancer;cluster analysis;machine learning;predictive model
    Date: 2022-07-16
    Issue Date: 2022-10-04 12:00:22 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 肺癌爲全球癌症死亡佔比第一的癌症,2020 年全球已有約 210 萬人被診斷罹患肺癌,同年因癌症死亡人數約 180 萬,分別佔全球總人數的 11.4%與 18%,死亡率常年居世界首位。隨著癌症診斷工具與治療方式的改進,肺癌患者的存活時間顯著增長,相應地,肺癌患者在罹患肺癌後發生二次癌症的數量在近十年中也有明顯的增長趨勢,對肺癌患者發生二次癌症風險的評估成爲一項重要議題。本研究旨在使用包含邏輯斯迴歸、隨機森林、支持向量機、極限梯度提升、單層前饋神經網絡以及堆疊模型等六種機器學習演算法,使用 2004 至 2018年至長庚醫院就診的十大癌症患者之登記資料建立肺癌二次癌症預測模型,其中以極限梯度提升訓練的預測模型作爲最終模型,平均AUROC達到0.755,標準差爲0.037。研究還使用非監督式聚類分析方法對肺癌患者進行異質性分析,將非監督聚類分群結果作爲新特徵,與原有特徵整合後,以監督式機器學習方法進行模型。結果顯示結合聚類分群結果的預測模型與其他模型相比效能並無顯著差異。在運用SHAP解釋模型方法進行特徵重要性分析後,我們發現在重要特徵因子中,患者進行手術切除原發部位、惡性肋膜積水情況不詳、整併期別爲StageⅠ與發生肺癌二次癌症風險呈正相關,而患者波及臟層膜或彈性層情況不詳、整併期別爲StageⅣ則與肺癌二次癌症風險呈負相關。最後,研究基於 R 語言中的 shiny 套件部署最終預測模型建立了一個針對肺癌二次癌症預測的臨床決策支援系統,供醫師參考。;Lung cancer is the leading cause of cancer death in the world. In 2020, about 2.1 million people worldwide were diagnosed with lung cancer, and about 1.8 million died, accounting for 11.4% and 18% of the global total, respectively. The mortality rate is ranking first in the world for more serval years. With the improvement of cancer diagnostic tools and treatment methods, the survival rate of lung cancer has increased significantly. Correspondingly, the number of lung cancer survivors with second primary cancer after suffering from lung cancer has also increased significantly in the past decade. The assessment of second primary cancer risk has become an important topic. This study aims to use six machine learning algorithms including logistic regression, random forest, support vector machine, extreme gradient boosting, single-layer feed-forward neural network, and stacking model, using the cancer registry data from Chang Gung Memorial Hospital from 2004 to 2018. The registration data of cancer patients was used to establish a prediction model for second primary cancer of lung cancer survivors. The average AUROC of models trained by XGBoost reached 0.755 and the standard deviation was 0.037. The study also used an unsupervised cluster analysis method to analyze the heterogeneity of lung cancer patients, combining supervised and unsupervised machine learning methods in the form of adding unsupervised clustering analysis results as new features to supervised machine learning methods for training. The results showed that the predictive model combined with the cluster analysis method had no significant difference in performance compared with other models. After using the SHAP interpretation model to analyze the importance of features, we found that among the important feature factors, patients undergoing surgical resection of the primary site, the status of malignant pleural effusion is unknown, and the stage of integration stage I is positively correlated with the risk of second primary cancer of lung cancer, while the patient′s involvement in the visceral membrane or elastic layer is unknown, and the stage IV is negatively correlated with the risk of second primary cancer of lung cancer. Finally, a clinical decision support system for second primary cancer prediction of lung cancer was established based on the final prediction model deployed by the shiny suite in the R language for reference by physicians.
    Appears in Collections:[Graduate Institute of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML48View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明