博碩士論文 109423005 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator徐筱茜zh_TW
DC.creatorHsiao-Chein Hsuen_US
dc.date.accessioned2022-8-26T07:39:07Z
dc.date.available2022-8-26T07:39:07Z
dc.date.issued2022
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=109423005
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract腦中風是全球重大的健康問題之一,為全球人類死亡的第二大主因,且中風造成失能 的後遺症,是我國成人殘障的主因之一。中風相關性肺炎(Stroke-associated pneumonia, SAP) 是急性中風 (Acute ischemic stroke, AIS)患者預後的一個重要臨床問題,大多數的中風患者會 有不同程度的活動障礙,例如吞嚥困難使吸入性肺炎風險增加了七倍,三分之一的 AIS 患者 患有肺炎,是最常見的呼吸系統併發症,SAP 與造成長期死亡率增加、住院時間延長、醫療 費用上升和預後功能下降密切相關。 本研究主要目的為加入深度學習技術入文字探勘萃取出非結構化電子病歷中可能影響 SAP 之新關鍵字作爲變數,並運用機器學習技術來為這些變數建構中風後肺炎的早期預測模 型,比較非結構化資料與結構化資料所建立之預測模型的預測效能差異,透過此研究模型預 測 AIS 患者於住院期間是否有併發肺炎風險,協助醫師做精準判斷並。 本研究使用嘉義基督教醫院由 2007 年 5 月至 2020 年 9 月共 941 名缺血性中風患者 之病人入院當日之中文 EMR 護理紀錄與中風登錄資料庫作為研究資料,其中之非結構化資 料運用六項技術進行特徵工程,包含 TFIDF、Doc2Vec、Bidirectional Encoder Representations from Transformers (BERT) 、 BioBERT 、 Bio_Clinical Bert 和 MetaMap 搭 配 UMLS Metathesaurus 與 Term Frequency,產生之文本特徵與結構特徵使用八項機器學習方法包含 包含支持向量機、單純貝氏分類器、K-近鄰演算法、邏輯斯迴歸、決策樹、隨機森林、極限 梯度提升、 輕量梯度提升進行建模預測及結果比較。 結果發現 1. 加入非結構化文本特徵合併結構特徵結構化來建構模型,將會提升單純 使用結構特徵或非結構文本特徵建構模型之預測效果,AUC 分別提升 1%和 9%。2.加入深度 學習技術於結合非結構化文本特徵工程,詞嵌入效果將於 AUC 上相比使用傳統特徵工程方 法有 1%之提升 3.使用基於生物醫學語料庫預訓練之 BioBERT 模型和基於醫學臨床紀錄預訓 練之 Bio_Clinical Bert 模型來作為此任務之非結構化 EMR 特徵工程技術,將比基於一 般語料 庫預訓練之 BERT 之詞嵌入,在後續建模上有更好之預測效果,AUC 分別提升 9%和 8%。 4.極限梯度學習 XGB 為中風後肺炎預測模型最適合機器學習分類器,以上有助於提升中風後 肺炎模型之預測性能,給予臨床醫師更準確之決策支援。zh_TW
dc.description.abstractStroke is one of the major health problems and the second leading cause of human death in the world. The disability sequelae caused by stroke is one of the main causes of adult disability in Taiwan. Stroke-associated pneumonia (SAP) is an important clinical problem in the prognosis of patients with acute stroke (AIS). Most stroke patients will suffer from varying degrees of mobility impairment. Dysphagia for example, seven times increased the risk of aspiration pneumonia. Thus, pneumonia is the most common respiratory complication which occurs in one- third of patients with AIS. SAP is closely associated with increased long-term mortality, prolonged hospital stays, increased healthcare costs, and decreased prognostic function. The main purpose of this study is to add deep learning and text mining techniques to extract new risk factor that may affect SAP in unstructured electronic medical records (EMR). And then apply them to construct a model for predicting the risk of pneumonia complicated by AIS patients during hospitalization. It assists doctors in diagnosing accurately. This study used Chinese EMR and stroke registration database from 2007-2017 from Chia- Yi Christian Hospital. In total, 941 eligible patients with AIS were used to build and evaluate the models. The unstructured data used six techniques for feature engineering, including TFIDF, Doc2Vec, MetaMap, Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT), BERT for Clinical Text Mining (Bio_Clinical Bert). Eight machine learning methods including support vector machine, simple Bayesian classifier, K-Nearest Neighbor algorithm, logistic regression, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine were implemented for developing models and comparing predictive result. The results show that adding unstructured text features and combining structural features to construct prediction model achieved better performance than model constructed by simply using structural features or unstructured text features. In addition, Deep learning feature engineering technology achieved better embedding effect better than traditional feature engineering methods. Using the BioBERT and Bio_Clinical Bert model which is pre-trained based on biomedical corpus and medical clinical records as the unstructured EMR feature engineering technology for this task enabled subsequent modeling to achieve better performance than using BERT general corpus- based model. In all investigated classification techniques, extreme gradient boosting is the most suitable machine learning classifier for the prediction model of post-stroke pneumonia.en_US
DC.subject腦中風zh_TW
DC.subject肺炎zh_TW
DC.subject文字探勘zh_TW
DC.subject機器學習zh_TW
DC.subject深度學習zh_TW
DC.subject電子醫療紀錄zh_TW
DC.subjectstrokeen_US
DC.subjectpneumoniaen_US
DC.subjecttext miningen_US
DC.subjectmachine learningen_US
DC.subjectdeep learningen_US
DC.subjectEMRen_US
DC.title使用文字探勘與深度學習技術建置中風後肺炎之預測模型zh_TW
dc.language.isozh-TWzh-TW
DC.titleDevelop The Predictive Model For Post Stroke Pneumonia By Using Text Mining And Deep Learningen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明