使用文字探勘與深度學習技術建置中風後肺炎之預測模型

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	徐筱茜	zh_TW
DC.creator	Hsiao-Chein Hsu	en_US
dc.date.accessioned	2022-8-26T07:39:07Z
dc.date.available	2022-8-26T07:39:07Z
dc.date.issued	2022
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=109423005
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	腦中風是全球重大的健康問題之一，為全球人類死亡的第二大主因，且中風造成失能的後遺症，是我國成人殘障的主因之一。中風相關性肺炎(Stroke-associated pneumonia, SAP) 是急性中風 (Acute ischemic stroke, AIS)患者預後的一個重要臨床問題，大多數的中風患者會有不同程度的活動障礙，例如吞嚥困難使吸入性肺炎風險增加了七倍，三分之一的 AIS 患者患有肺炎，是最常見的呼吸系統併發症，SAP 與造成長期死亡率增加、住院時間延長、醫療費用上升和預後功能下降密切相關。本研究主要目的為加入深度學習技術入文字探勘萃取出非結構化電子病歷中可能影響 SAP 之新關鍵字作爲變數，並運用機器學習技術來為這些變數建構中風後肺炎的早期預測模型，比較非結構化資料與結構化資料所建立之預測模型的預測效能差異，透過此研究模型預測 AIS 患者於住院期間是否有併發肺炎風險，協助醫師做精準判斷並。本研究使用嘉義基督教醫院由 2007 年 5 月至 2020 年 9 月共 941 名缺血性中風患者之病人入院當日之中文 EMR 護理紀錄與中風登錄資料庫作為研究資料，其中之非結構化資料運用六項技術進行特徵工程，包含 TFIDF、Doc2Vec、Bidirectional Encoder Representations from Transformers (BERT) 、 BioBERT 、 Bio_Clinical Bert 和 MetaMap 搭配 UMLS Metathesaurus 與 Term Frequency，產生之文本特徵與結構特徵使用八項機器學習方法包含包含支持向量機、單純貝氏分類器、K-近鄰演算法、邏輯斯迴歸、決策樹、隨機森林、極限梯度提升、輕量梯度提升進行建模預測及結果比較。結果發現 1. 加入非結構化文本特徵合併結構特徵結構化來建構模型，將會提升單純使用結構特徵或非結構文本特徵建構模型之預測效果，AUC 分別提升 1%和 9%。2.加入深度學習技術於結合非結構化文本特徵工程，詞嵌入效果將於 AUC 上相比使用傳統特徵工程方法有 1%之提升 3.使用基於生物醫學語料庫預訓練之 BioBERT 模型和基於醫學臨床紀錄預訓練之 Bio_Clinical Bert 模型來作為此任務之非結構化 EMR 特徵工程技術，將比基於一般語料庫預訓練之 BERT 之詞嵌入，在後續建模上有更好之預測效果，AUC 分別提升 9%和 8%。 4.極限梯度學習 XGB 為中風後肺炎預測模型最適合機器學習分類器，以上有助於提升中風後肺炎模型之預測性能，給予臨床醫師更準確之決策支援。	zh_TW
dc.description.abstract	Stroke is one of the major health problems and the second leading cause of human death in the world. The disability sequelae caused by stroke is one of the main causes of adult disability in Taiwan. Stroke-associated pneumonia (SAP) is an important clinical problem in the prognosis of patients with acute stroke (AIS). Most stroke patients will suffer from varying degrees of mobility impairment. Dysphagia for example, seven times increased the risk of aspiration pneumonia. Thus, pneumonia is the most common respiratory complication which occurs in one- third of patients with AIS. SAP is closely associated with increased long-term mortality, prolonged hospital stays, increased healthcare costs, and decreased prognostic function. The main purpose of this study is to add deep learning and text mining techniques to extract new risk factor that may affect SAP in unstructured electronic medical records (EMR). And then apply them to construct a model for predicting the risk of pneumonia complicated by AIS patients during hospitalization. It assists doctors in diagnosing accurately. This study used Chinese EMR and stroke registration database from 2007-2017 from Chia- Yi Christian Hospital. In total, 941 eligible patients with AIS were used to build and evaluate the models. The unstructured data used six techniques for feature engineering, including TFIDF, Doc2Vec, MetaMap, Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT), BERT for Clinical Text Mining (Bio_Clinical Bert). Eight machine learning methods including support vector machine, simple Bayesian classifier, K-Nearest Neighbor algorithm, logistic regression, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine were implemented for developing models and comparing predictive result. The results show that adding unstructured text features and combining structural features to construct prediction model achieved better performance than model constructed by simply using structural features or unstructured text features. In addition, Deep learning feature engineering technology achieved better embedding effect better than traditional feature engineering methods. Using the BioBERT and Bio_Clinical Bert model which is pre-trained based on biomedical corpus and medical clinical records as the unstructured EMR feature engineering technology for this task enabled subsequent modeling to achieve better performance than using BERT general corpus- based model. In all investigated classification techniques, extreme gradient boosting is the most suitable machine learning classifier for the prediction model of post-stroke pneumonia.	en_US
DC.subject	腦中風	zh_TW
DC.subject	肺炎	zh_TW
DC.subject	文字探勘	zh_TW
DC.subject	機器學習	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	電子醫療紀錄	zh_TW
DC.subject	stroke	en_US
DC.subject	pneumonia	en_US
DC.subject	text mining	en_US
DC.subject	machine learning	en_US
DC.subject	deep learning	en_US
DC.subject	EMR	en_US
DC.title	使用文字探勘與深度學習技術建置中風後肺炎之預測模型	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Develop The Predictive Model For Post Stroke Pneumonia By Using Text Mining And Deep Learning	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 109423005 完整後設資料紀錄