運用電子病歷與資料探勘技術建構腦中風病人心房顫動預測模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：18.218.163.28

姓名

詹昕瑜(Hsin-Yu Chan) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

運用電子病歷與資料探勘技術建構腦中風病人心房顫動預測模型

相關論文

★ 不動產仲介業銷售住宅類別之成交預測模型—以不動產仲介S公司為例	★ 應用文字探勘技術建構預測客訴問題類別機器學習模型
★ 以機器學習技術建構顧客回購率預測模型：以某手工皂原料電子商務網站為例	★ 以機器學習建構股價預測模型：以台灣股市為例
★ 以機器學習方法建構財務危機之預測模型：以台灣上市櫃公司為例	★ 運用資料探勘技術於股票填息之預測模型：以台灣股市上市公司為例
★ 運用資料探勘技術優化次世代防火牆規則之研究	★ 應用資料探勘技術於電子病歷文本中識別相關新資訊
★ 應用深度學習於藥品後市場監督：Twitter文本分類任務	★ 考量特徵選取與隨機森林之遺漏值填補技術
★ 電子病歷縮寫消歧與一對多分類任務	★ 運用Meta-path與注意力機制改善個人化穿搭推薦
★ 運用機器學習技術建構核保風險預測模型：以A公司為例	★ 風扇壽命預測使用大數據分析－以 X 公司為例
★ 使用文字探勘與深度學習技術建置中風後肺炎之預測模型	★ 利用文字探勘技術分析評論特徵因子對於體驗品評論有益性之影響：以IMDb 為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-8-1以後開放)

摘要(中)

突發性的腦血管疾病，也稱作中風，是造成全世界人類死亡的第二大原因，也是導致失能的第三大原因。心房顫動為缺血性中風的潛在因素，並且與缺血性中風有著極大的關聯，但心房顫動不易檢測，時常有陣發性發作卻被誤判成無症狀，以致無法被妥善治療的情形發生。當一個缺血性中風病人若偵測到有心房顫動，其中風次級預防之策略通常就會隨之改變，因為在這樣的狀況下，口服抗凝血劑的效果基本上會優於口服抗血小板藥物的治療，口服抗凝血劑可將中風病人復發的風險降低三分之二。本研究主要目的為使用非結構化的文字資料，藉由機器學習的演算法，於已發生缺血性中風之病人，建立中風後心房顫動的早期預測模型，並實際以電子病歷中的資料進行驗證。次要目的則為比較結構化資料與非結構化資料所建立之預測模型的預測效能有無不同，希望本研究所建立之模型可以輔助醫生的醫療決策，更能妥善運用醫療資源。
在預測心房顫動之實驗中，實驗1可發現邏輯迴歸技術在不同特徵之資料中皆有最好的指標效果，其中又以合併特徵搭配邏輯迴歸分類器最佳(AUC=0.8324)；在實驗2中以兩家醫院之資料互相建立模型並交互驗證，從結果得知使用不同醫院之非結構化資料建立心房顫動預測模型，評估指標的效果並不如預期。因此本研究證明結構特徵加上文字特徵，比起只單純使用結構特徵，可助於提升模型之性能。

摘要(英)

Cerebrovascular disease, which is also known as stroke, is the second largest reason of deaths of human worldwide and the third largest reason of disability. Atrial fibrillation is the potential factor to cause ischemic stroke, and it is strongly related to ischemic stroke as well. However, it′s difficult to detect atrial fibrillation, causing the situation that the patient can′t receive the treatment properly. When an acute ischemic stroke patient is detected atrial fibrillation, the strategy of secondary prevention will be modified accordingly. The main purpose of this study is to use electronic medical records and the machine learning algorithm to build the early prediction model based on the patients who have had ischemic stroke. The second purpose is to compare the performance of the prediction model based on the structured data with that based on the unstructured data. We hope that the model proposed by the study can assist the doctors′ medical decision making, and to utilize medical resources properly.
In the experiment of predicting atrial fibrillation, we found that in the experiment 1, logistic regression classifier has the best performance on data with different features, especially on structural features combined with text features. In the experiment 2, we build and cross validate the model based on the data of two hospitals. The results indicated that using unstructured data of different hospitals to build prediction model of atrial fibrillation, the effect of performance is not as expected. Therefore, this study proved that compared to only using the structured features, the combination of structured and text features can enhance the performance of the model.

關鍵字(中)

★ 心房顫動
★ 腦中風
★ 電子病歷
★ 文字探勘
★ 機器學習

關鍵字(英)

★ Atrial fibrillation
★ Stroke
★ Electronic medical record
★ Text mining
★ Machine learning

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章、緒論 1
1.1研究背景 1
1.2研究動機 3
1.3研究目的 4
第二章、文獻探討 5
2.1電子病歷於臨床決策支援系統之相關研究 5
2.2 AF預測之應用 8
第三章、研究方法 10
3.1資料來源 12
3.2依變數定義 13
3.3自變數定義 14
3.4資料前處理 16
3.5特徵工程 16
3.5.1 Term Frequency-Inverse document frequency (TFIDF) 17
3.5.2 Doc2Vec (D2V) 18
3.5.3醫學概念之對應(MetaMap) 19
3.5.4 Bidirectional Encoder Representations from Transformers (Bert) 20
3.6分類技術 21
3.6.1支援向量機(Support Vector Machine, SVM) 21
3.6.2簡單貝氏(Naive Bayes, NB) 22
3.6.3隨機森林(Random Forest, RF) 22
3.6.4邏輯迴歸(Logistic Regression, LR) 23
3.6.5極限梯度提升(Extreme Gradient Boosting, XGB) 23
3.7預測模型評估指標 24
第四章、實驗評估 25
4.1實驗設計與分析技術 25
4.2實驗結果 28
4.2.1實驗1 28
4.2.2實驗2 34
4.3討論 37
第五章、研究結論與建議 42
5.1研究結論 42
5.2研究限制 43
5.3未來研究方向與建議 43
第六章、參考文獻 44
附錄一 50
附錄二 53

參考文獻

Almeida, F., &Xexéo, G. (2019). Word Embeddings: A Survey. Retrieved from http://arxiv.org/abs/1901.09069
Alonso, A., Krijthe, B. P., Aspelund, T., Stepas, K. A., Pencina, M. J., Moser, C. B., …Benjamin, E. J. (2013). Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. Journal of the American Heart Association, 2(2). https://doi.org/10.1161/JAHA.112.000102
Alpert, J. S. (2019, April 1). The Electronic Medical Record: Beauty and the Beast. American Journal of Medicine, Vol. 132, pp. 393–394. Elsevier Inc. https://doi.org/10.1016/j.amjmed.2018.12.004
Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 17–21. Retrieved from /pmc/articles/PMC2243666/?report=abstract
Aronson, Alan R., &Lang, F. M. (2010). An overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229–236. https://doi.org/10.1136/jamia.2009.002733
Bergström, L., Irewall, A. L., Söderström, L., Ögren, J., Laurell, K., &Mooe, T. (2017, August 1). One-Year Incidence, Time Trends, and Predictors of Recurrent Ischemic Stroke in Sweden from 1998 to 2010: An Observational Study. Stroke, Vol. 48, pp. 2046–2051. Lippincott Williams and Wilkins. https://doi.org/10.1161/STROKEAHA.117.016815
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Chen, T., &Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
Cortes, C., &Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/bf00994018
Devlin, J., Chang, M. W., Lee, K., &Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, 4171–4186. Association for Computational Linguistics (ACL). Retrieved from https://github.com/tensorflow/tensor2tensor
Feigin, V. L., Forouzanfar, M. H., Krishnamurthi, R., Mensah, G. A., Connor, M., Bennett, D. A., …Naghavi, M. (2014). Global and regional burden of stroke during 1990-2010: Findings from the Global Burden of Disease Study 2010. The Lancet, 383(9913), 245–255. https://doi.org/10.1016/S0140-6736(13)61953-4
Friedlin, J., Overhage, M., Al-Haddad, M. A., Waters, J. A., Aguilar-Saavedra, J. J. R., Kesterson, J., &Schmidt, M. (2010). Comparing methods for identifying pancreatic cancer patients using electronic data sources. AMIA ... Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 2010, 237–241. Retrieved from /pmc/articles/PMC3041435/
Gilmer, T. P., O’Connor, P. J., Sperl‐Hillen, J. M., Rush, W. A., Johnson, P. E., Amundson, G. H., …Ekstrom, H. L. (2012). Cost‐Effectiveness of an Electronic Medical Record Based Clinical Decision Support System. Health Services Research, 47(6), 2137–2158. https://doi.org/10.1111/j.1475-6773.2012.01427.x
Go, A. S., Reynolds, K., Yang, J., Gupta, N., Lenane, J., Sung, S. H., …Solomon, M. D. (2018). Association of burden of atrial fibrillation with risk of ischemic stroke in adults with paroxysmal atrial fibrillation: The KP-RHYTHM study. JAMA Cardiology, 3(7), 601–608. https://doi.org/10.1001/jamacardio.2018.1176
Goldberg, Y., &Levy, O. (2014). word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. Retrieved from http://arxiv.org/abs/1402.3722
Hankey, G. J., Jamrozik, K., Broadhurst, R. J., Forbes, S., &Anderson, C. S. (2002). Long-term disability after first-ever stroke and related prognostic factors in the Perth Community Stroke Study, 1989-1990. Stroke, 33(4), 1034–1040. https://doi.org/10.1161/01.STR.0000012515.66889.24
Healey, J. S., &Wong, J. A. (2019, November 1). Pre-Screening for Atrial Fibrillation Using the Electronic Health Record. JACC: Clinical Electrophysiology, Vol. 5, pp. 1342–1343. Elsevier Inc. https://doi.org/10.1016/j.jacep.2019.08.019
Hoogendoorn, M., Szolovits, P., Moons, L. M. G., &Numans, M. E. (2016). Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artificial Intelligence in Medicine, 69, 53–61. https://doi.org/10.1016/j.artmed.2016.03.003
Horng, S., Sontag, D. A., Halpern, Y., Jernite, Y., Shapiro, N. I., &Nathanson, L. A. (2017). Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLOS ONE, 12(4), e0174708. https://doi.org/10.1371/journal.pone.0174708
Hsieh, C. Y., Wu, D. P., &Sung, S. F. (2017). Trends in vascular risk factors, stroke performance measures, and outcomes in patients with first-ever ischemic stroke in Taiwan between 2000 and 2012. Journal of the Neurological Sciences, 378, 80–84. https://doi.org/10.1016/j.jns.2017.05.002
Hsieh, F. I., Lien, L. M., Chen, S. T., Bai, C. H., Sun, M. C., Tseng, H. P., …Hsu, C. Y. (2010). Get with the guidelines-stroke performance indicators: Surveillance of Stroke Care in the Taiwan Stroke Registry: Get with the guidelines-stroke in Taiwan. Circulation, 122(11), 1116–1123. https://doi.org/10.1161/CIRCULATIONAHA.110.936526
Hulme, O. L., Khurshid, S., Weng, L. C., Anderson, C. D., Wang, E. Y., Ashburner, J. M., …Lubitz, S. A. (2019). Development and Validation of a Prediction Model for Atrial Fibrillation Using Electronic Health Records. JACC: Clinical Electrophysiology, 5(11), 1331–1341. https://doi.org/10.1016/j.jacep.2019.07.016
Johnson, W., Onuma, O., Owolabi, M., &Sachdev, S. (2016). Stroke: A global response is needed. Bulletin of the World Health Organization, 94(9), 634A-635A. https://doi.org/10.2471/BLT.16.181636
Jones, N. R., Taylor, C. J., Hobbs, F. D. R., Bowman, L., &Casadei, B. (2020). Screening for atrial fibrillation: A call for evidence. European Heart Journal, Vol. 41, pp. 1075–1085. https://doi.org/10.1093/eurheartj/ehz834
Karnik, S., Tan, S. L., Berg, B., Glurich, I., Zhang, J., Vidaillet, H. J., …Chowdhary, R. (2012). Predicting atrial fibrillation and flutter using electronic health records. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 5562–5565. https://doi.org/10.1109/EMBC.2012.6347254

Khurshid, S., Keaney, J., Ellinor, P. T., &Lubitz, S. A. (2016). A simple and portable algorithm for identifying atrial fibrillation in the electronic medical record. American Journal of Cardiology, 117(2), 221–225. https://doi.org/10.1016/j.amjcard.2015.10.031
Kolek, M. J., Graves, A. J., Xu, M., Bian, A., Teixeira, P. L., Shoemaker, M. B., …Darbar, D. (2016). Evaluation of a prediction model for the development of atrial fibrillation in a repository of electronic medical records. JAMA Cardiology, 1(9), 1007–1013. https://doi.org/10.1001/jamacardio.2016.3366
Kwong, C., Ling, A. Y., Crawford, M. H., Zhao, S. X., &Shah, N. H. (2017). A Clinical Score for Predicting Atrial Fibrillation in Patients with Cryptogenic Stroke or Transient Ischemic Attack. Cardiology, 138(3), 133–140. https://doi.org/10.1159/000476030
Le, Q.V., &Mikolov, T. (2014). Distributed Representations of Sentences and Documents. 31st International Conference on Machine Learning, ICML 2014, 4, 2931–2939. Retrieved from http://arxiv.org/abs/1405.4053
Li, L., Chase, H. S., Patel, C. O., Friedman, C., &Weng, C. (2008). Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA ... Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 2008, 404–408. Retrieved from http://www.dbmi.columbia.edu/~chw7007/ICD.htm.
Li, Y. G., Bisson, A., Bodin, A., Herbert, J., Grammatico-Guillon, L., Joung, B., …Fauchier, L. (2019). C2HEST score and prediction of incident atrial fibrillation in poststroke patients: A French nationwide study. Journal of the American Heart Association, 8(13). https://doi.org/10.1161/JAHA.119.012546
Lip, G. Y. H., Hunter, T. D., Quiroz, M. E., Ziegler, P. D., &Turakhia, M. P. (2017). Atrial Fibrillation Diagnosis Timing, Ambulatory ECG Monitoring Utilization, and Risk of Recurrent Stroke. Circulation: Cardiovascular Quality and Outcomes, 10(1). https://doi.org/10.1161/CIRCOUTCOMES.116.002864
Mikolov, T., Chen, K., Corrado, G., &Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. International Conference on Learning Representations, ICLR. Retrieved from http://ronan.collobert.com/senna/
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., &Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems. Retrieved from http://arxiv.org/abs/1310.4546
Mujtaba, G., Shuib, L., Idris, N., Hoo, W. L., Raj, R. G., Khowaja, K., …Nweke, H. F. (2019, February 1). Clinical text classification research trends: Systematic literature review and open issues. Expert Systems with Applications, Vol. 116, pp. 494–520. Elsevier Ltd. https://doi.org/10.1016/j.eswa.2018.09.034
Proietti, M., Lane, D. A., Boriani, G., &Lip, G. Y. H. (2019, May 1). Stroke Prevention, Evaluation of Bleeding Risk, and Anticoagulant Treatment Management in Atrial Fibrillation Contemporary International Guidelines. Canadian Journal of Cardiology, Vol. 35, pp. 619–633. Elsevier Inc. https://doi.org/10.1016/j.cjca.2019.02.009
Rumshisky, A., Ghassemi, M., Naumann, T., Szolovits, P., Castro, V. M., McCoy, T. H., &Perlis, R. H. (2016). Predicting early psychiatric readmission with natural language processing of narrative discharge summaries. Translational Psychiatry, 6(10), e921. https://doi.org/10.1038/tp.2015.182
Sposato, L. A., Cerasuolo, J. O., Cipriano, L. E., Fang, J., Fridman, S., Paquet, M., &Saposnik, G. (2018). Atrial fibrillation detected after stroke is related to a low risk of ischemic stroke recurrence. Neurology, 90(11), e924–e931. https://doi.org/10.1212/WNL.0000000000005126
Sposato, L. A., Cipriano, L. E., Saposnik, G., Vargas, E. R., Riccio, P. M., &Hachinski, V. (2015). Diagnosis of atrial fibrillation after stroke and transient ischaemic attack: A systematic review and meta-analysis. The Lancet Neurology, 14(4), 377–387. https://doi.org/10.1016/S1474-4422(15)70027-X
Sung, S. F., Lin, C. Y., &Hu, Y. H. (2020). EMR-Based Phenotyping of Ischemic Stroke Using Supervised Machine Learning and Text Mining Techniques. IEEE Journal of Biomedical and Health Informatics, 24(10), 2922–2931. https://doi.org/10.1109/JBHI.2020.2976931
The GBD 2016 Lifetime Risk of Stroke Collaborators. (2018). Global, Regional, and Country-Specific Lifetime Risks of Stroke, 1990 and 2016. New England Journal of Medicine, 379(25), 2429–2437. https://doi.org/10.1056/NEJMoa1804492

Uphaus, T., Weber-Krüger, M., Grond, M., Toenges, G., Jahn-Eimermacher, A., Jauss, M., …Groschel, K. (2019). Development and validation of a score to detect paroxysmal atrial fibrillation after stroke. Neurology, 92(2), E115–E124. https://doi.org/10.1212/WNL.0000000000006727
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., …Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-December, 5999–6009. Neural information processing systems foundation. Retrieved from https://arxiv.org/abs/1706.03762v5
Yaghi, S., Bernstein, R. A., Passman, R., Okin, P. M., &Furie, K. L. (2017, February 3). Cryptogenic Stroke: Research and Practice. Circulation Research, Vol. 120, pp. 527–540. Lippincott Williams and Wilkins. https://doi.org/10.1161/CIRCRESAHA.116.308447
Yang, X. M., Rao, Z. Z., Gu, H. Q., Zhao, X. Q., Wang, C. J., Liu, L. P., …Wang, Y. J. (2019). Atrial Fibrillation Known Before or Detected After Stroke Share Similar Risk of Ischemic Stroke Recurrence and Death. Stroke, 50(5), 1124–1129. https://doi.org/10.1161/STROKEAHA.118.024176

指導教授

胡雅涵(Ya-Han Hu)

審核日期

2021-8-4

推文