使用文字探勘與深度學習技術建置中風後肺炎之預測模型;Develop The Predictive Model For Post Stroke Pneumonia By Using Text Mining And Deep Learning

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/89908

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/89908

題名:	使用文字探勘與深度學習技術建置中風後肺炎之預測模型;Develop The Predictive Model For Post Stroke Pneumonia By Using Text Mining And Deep Learning
作者:	徐筱茜;Hsu, Hsiao-Chein
貢獻者:	資訊管理學系
關鍵詞:	腦中風;肺炎;文字探勘;機器學習;深度學習;電子醫療紀錄;stroke;pneumonia;text mining;machine learning;deep learning;EMR
日期:	2022-08-26
上傳時間:	2022-10-04 12:04:26 (UTC+8)
出版者:	國立中央大學
摘要:	腦中風是全球重大的健康問題之一，為全球人類死亡的第二大主因，且中風造成失能的後遺症，是我國成人殘障的主因之一。中風相關性肺炎(Stroke-associated pneumonia, SAP) 是急性中風 (Acute ischemic stroke, AIS)患者預後的一個重要臨床問題，大多數的中風患者會有不同程度的活動障礙，例如吞嚥困難使吸入性肺炎風險增加了七倍，三分之一的 AIS 患者患有肺炎，是最常見的呼吸系統併發症，SAP 與造成長期死亡率增加、住院時間延長、醫療費用上升和預後功能下降密切相關。本研究主要目的為加入深度學習技術入文字探勘萃取出非結構化電子病歷中可能影響 SAP 之新關鍵字作爲變數，並運用機器學習技術來為這些變數建構中風後肺炎的早期預測模型，比較非結構化資料與結構化資料所建立之預測模型的預測效能差異，透過此研究模型預測 AIS 患者於住院期間是否有併發肺炎風險，協助醫師做精準判斷並。本研究使用嘉義基督教醫院由 2007 年 5 月至 2020 年 9 月共 941 名缺血性中風患者之病人入院當日之中文 EMR 護理紀錄與中風登錄資料庫作為研究資料，其中之非結構化資料運用六項技術進行特徵工程，包含 TFIDF、Doc2Vec、Bidirectional Encoder Representations from Transformers (BERT) 、 BioBERT 、 Bio_Clinical Bert 和 MetaMap 搭配 UMLS Metathesaurus 與 Term Frequency，產生之文本特徵與結構特徵使用八項機器學習方法包含包含支持向量機、單純貝氏分類器、K-近鄰演算法、邏輯斯迴歸、決策樹、隨機森林、極限梯度提升、輕量梯度提升進行建模預測及結果比較。結果發現 1. 加入非結構化文本特徵合併結構特徵結構化來建構模型，將會提升單純使用結構特徵或非結構文本特徵建構模型之預測效果，AUC 分別提升 1%和 9%。2.加入深度學習技術於結合非結構化文本特徵工程，詞嵌入效果將於 AUC 上相比使用傳統特徵工程方法有 1%之提升 3.使用基於生物醫學語料庫預訓練之 BioBERT 模型和基於醫學臨床紀錄預訓練之 Bio_Clinical Bert 模型來作為此任務之非結構化 EMR 特徵工程技術，將比基於一般語料庫預訓練之 BERT 之詞嵌入，在後續建模上有更好之預測效果，AUC 分別提升 9%和 8%。 4.極限梯度學習 XGB 為中風後肺炎預測模型最適合機器學習分類器，以上有助於提升中風後肺炎模型之預測性能，給予臨床醫師更準確之決策支援。 ;Stroke is one of the major health problems and the second leading cause of human death in the world. The disability sequelae caused by stroke is one of the main causes of adult disability in Taiwan. Stroke-associated pneumonia (SAP) is an important clinical problem in the prognosis of patients with acute stroke (AIS). Most stroke patients will suffer from varying degrees of mobility impairment. Dysphagia for example, seven times increased the risk of aspiration pneumonia. Thus, pneumonia is the most common respiratory complication which occurs in one- third of patients with AIS. SAP is closely associated with increased long-term mortality, prolonged hospital stays, increased healthcare costs, and decreased prognostic function. The main purpose of this study is to add deep learning and text mining techniques to extract new risk factor that may affect SAP in unstructured electronic medical records (EMR). And then apply them to construct a model for predicting the risk of pneumonia complicated by AIS patients during hospitalization. It assists doctors in diagnosing accurately. This study used Chinese EMR and stroke registration database from 2007-2017 from Chia- Yi Christian Hospital. In total, 941 eligible patients with AIS were used to build and evaluate the models. The unstructured data used six techniques for feature engineering, including TFIDF, Doc2Vec, MetaMap, Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT), BERT for Clinical Text Mining (Bio_Clinical Bert). Eight machine learning methods including support vector machine, simple Bayesian classifier, K-Nearest Neighbor algorithm, logistic regression, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine were implemented for developing models and comparing predictive result. The results show that adding unstructured text features and combining structural features to construct prediction model achieved better performance than model constructed by simply using structural features or unstructured text features. In addition, Deep learning feature engineering technology achieved better embedding effect better than traditional feature engineering methods. Using the BioBERT and Bio_Clinical Bert model which is pre-trained based on biomedical corpus and medical clinical records as the unstructured EMR feature engineering technology for this task enabled subsequent modeling to achieve better performance than using BERT general corpus- based model. In all investigated classification techniques, extreme gradient boosting is the most suitable machine learning classifier for the prediction model of post-stroke pneumonia.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	127	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....