應用資料探勘技術於電子病歷文本中識別相關新資訊

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：3.145.98.239

姓名

黃濬灃(Chun-Feng Huang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

應用資料探勘技術於電子病歷文本中識別相關新資訊
(Using Data Mining Techniques to Identify Relevant New Information in Electronic Health Records)

相關論文

★ 不動產仲介業銷售住宅類別之成交預測模型—以不動產仲介S公司為例	★ 應用文字探勘技術建構預測客訴問題類別機器學習模型
★ 以機器學習技術建構顧客回購率預測模型：以某手工皂原料電子商務網站為例	★ 以機器學習建構股價預測模型：以台灣股市為例
★ 以機器學習方法建構財務危機之預測模型：以台灣上市櫃公司為例	★ 運用資料探勘技術於股票填息之預測模型：以台灣股市上市公司為例
★ 運用資料探勘技術優化次世代防火牆規則之研究	★ 應用深度學習於藥品後市場監督：Twitter文本分類任務
★ 運用電子病歷與資料探勘技術建構腦中風病人心房顫動預測模型	★ 考量特徵選取與隨機森林之遺漏值填補技術
★ 電子病歷縮寫消歧與一對多分類任務	★ 運用Meta-path與注意力機制改善個人化穿搭推薦
★ 運用機器學習技術建構核保風險預測模型：以A公司為例	★ 風扇壽命預測使用大數據分析－以 X 公司為例
★ 使用文字探勘與深度學習技術建置中風後肺炎之預測模型	★ 利用文字探勘技術分析評論特徵因子對於體驗品評論有益性之影響：以IMDb 為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-7-20以後開放)

摘要(中)

電子病歷在當今醫療系統中已被廣泛使用，成為紀錄病患訊息的主要媒介，電子病歷帶來諸多好處，例如：加速資訊傳播、減少實體儲存所需空間或是加快撰寫速度，但也因為其特性，而造成許多問題，例如：在醫療資訊系統內，複製貼上功能雖然可以加快醫護人員撰寫速度，但長期下來卻會產生冗餘資訊，造成醫護人員閱讀時的阻礙，導致醫療照護品質下降。
使用文本摘要方法，雖然可以有效精簡病歷篇幅，但長期累積下來，摘要後的病歷依然存在冗餘資訊，若能先識別病歷中的新資訊再進行摘要，將可降低冗餘資訊比例。過去研究對於新資訊識別方法多為針對單詞、字串樣式進行處理的文字層級(Word-level)，若能考慮語義層級(Sematic-level)，將能有更好的表現。
本研究使用語義等級(Semantic-level)方法，對文本進行處理，以文本中存在的概念利用二分法及相似度分數法作為判斷標準，進行新資訊識別，再與醫師標註之Gold Standard進行比較，衡量標註效果，最後將新資訊標註結果呈現於醫療決策支援系統中，使醫護人員能快速參考病歷並做出決策。

摘要(英)

Electronic health records have been widely used in nowadays’ healthcare system and have become the essential intermedium for keeping the patients’ health records. It has so many advantages, such as accelerating the transmission of data, reducing physical space for storing notes, and bringing efficiency for the healthcare professionals to writing notes. On the other hand, the healthcare professionals can use copy and paste while conducting clinical notes, and it’ll create information redundancy. In the long run, that would be a huge obstacle for healthcare professional to read them and decline the quality of healthcare.
Through the ordinary method of text summarization, the length of the medical records can be shortened, but the redundancy still remains. If the new information in each note could be identified preliminary, it’ll help to lower the portion of redundancy.
Previous studies of new information identification mainly focus on word-level, if the semantic-level can also be considered, it may yield a better result.
The purpose of this paper is using text-mining techniques to identify the relevant new information at semantic-level. We proposed two methods: 1) Concept occurrence and 2) Concept similarity score to annotate new information and then evaluating the performance with gold standards.
Finally, visualize the results, make healthcare professionals reading clinical records more efficient, and achieving better decisions.

關鍵字(中)

★ 資料探勘
★ 新資訊
★ 統一醫學語言系統
★ 語義相似度
★ 電子病歷

關鍵字(英)

★ Data mining
★ New information
★ Semantic similarity
★ UMLS
★ Electronic health records

論文目次

第一章緒論 1
1.1 研究背景 1
1.2 研究動機 4
1.3 研究目的 6
第二章文獻探討 7
2.1 醫療文本摘要建構 7
萃取式摘要 7
2.2 醫療文本新資訊識別 12
第三章研究方法 19
3.1 資料來源 21
3.2 資料前處理 23
3.3 統一醫學語言系統(Unified Medical Language System) 25
3.4 MetaMap映射醫療專業術語 27
3.5 相似度計算方法 36
3.6 新資訊識別方法 38
3.7 實驗設計 40
 實驗一：以概念二分法識別新資訊 41
 實驗二：以概念相似度分數識別新資訊 42
3.8 效能評估方法 43

第四章實驗結果與分析 44
4.1 實驗結果與評估 44
4.1.1 實驗一 44
4.1.2 實驗二 51
4.2 實驗討論 57
第五章研究結論與建議 59
5.1 研究結論 59
5.2 研究限制 61
5.3 未來研究方向與建議 62
參考文獻 63

參考文獻

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text Summarization Techniques: A Brief Survey.
Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229–236. https://doi.org/10.1136/jamia.2009.002733
Batet, M., Sánchez, D., & Valls, A. (2011). An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics, 44(1), 118–125. https://doi.org/10.1016/j.jbi.2010.09.002
Batley, N. J., Osman, H. O., Kazzi, A. A., & Musallam, K. M. (2011). Implementation of an emergency department computer system: Design features that users value. Journal of Emergency Medicine, 41(6), 693–700. https://doi.org/10.1016/j.jemermed.2010.05.014
Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., … Shekelle, P. G. (2006). Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Annals of Internal Medicine, 144(10), 742–752. https://doi.org/10.7326/0003-4819-144-10-200605160-00125
Chen, L., Song, L., Shao, Y., Li, D., & Ding, K. (2019). Using natural language processing to extract clinically useful information from Chinese electronic medical records. International Journal of Medical Informatics, 124, 6–12. https://doi.org/https://doi.org/10.1016/j.ijmedinf.2019.01.004
Cilibrasi, R. L., & Vitányi, P. M. B. (2007). The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383. https://doi.org/10.1109/TKDE.2007.48
Fiszman, M., Rindflesch, T. C., & Kilicoglu, H. (2004). Abstraction summarization for managing the biomedical research literature. 76–83. https://doi.org/10.3115/1596431.1596442
Handel, D. A., Wears, R. L., Nathanson, L. A., & Pines, J. M. (2011). Using Information Technology to Improve the Quality and Safety of Emergency Care. Academic Emergency Medicine, 18(6), e45–e51. https://doi.org/https://doi.org/10.1111/j.1553-2712.2011.01070.x
Hingle, S. (2016). Electronic Health Records: An Unfulfilled Promise and a Call to Action. Annals of Internal Medicine, 165(11), 818–819. https://doi.org/10.7326/M16-1757
Hirschtick, R. E. (2006). A piece of my mind. Copy-and-paste. JAMA, 295(20), 2335–2336. https://doi.org/10.1001/jama.295.20.2335
Hu, Q., Huang, Z., ten Teije, A., & van Harmelen, F. (2015). Detecting new evidence for evidence-based guidelines using a semantic distance method. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9105, 307–316. https://doi.org/10.1007/978-3-319-19551-3_39
Humphrey, S. M., Rogers, W. J., Kilicoglu, H., Demner-Fushman, D., & Rindflesch, T. C. (2006). Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment. Journal of the American Society for Information Science and Technology, 57(1), 96–113. https://doi.org/10.1002/asi.20257
Jiang, J. J., & Conrath, D. W. (1997). Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings of the 10th Research on Computational Linguistics International Conference, 19–33. Taipei, Taiwan: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP). Retrieved from https://www.aclweb.org/anthology/O97-1002
Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database, 265–283.
Liang, J., Tsou, C.-H., & Poddar, A. (2019). A Novel System for Extractive Clinical Note Summarization using {EHR} Data. Proceedings of the 2nd Clinical Natural Language Processing Workshop, 46–54. Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-1906
Lin, D. (1998). An Information-Theoretic Definition of Similarity. Proceedings of the Fifteenth International Conference on Machine Learning, 296–304. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Marc Overhage, J., & McCallie, D. (2020). Physician time spent using the electronic health record during outpatient encounters a descriptive study. Annals of Internal Medicine, 172(3), 169–174. https://doi.org/10.7326/M18-3684
Markel, A. (2010, May). Copy and paste of electronic health records: a modern medical illness. The American Journal of Medicine, Vol. 123, p. e9. United States. https://doi.org/10.1016/j.amjmed.2009.10.012
McInnes, B. T., Pedersen, T., & Pakhomov, S. V. S. (2009). UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity. AMIA ... Annual Symposium Proceedings. AMIA Symposium, 2009, 431–435. Retrieved from https://pubmed.ncbi.nlm.nih.gov/20351894
Menachemi, N., & Collum, T. H. (2011). Benefits and drawbacks of electronic health record systems. Risk Management and Healthcare Policy, 4, 47–55. https://doi.org/10.2147/RMHP.S12985
Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of Medical Informatics, 128–144. https://doi.org/10.1055/s-0038-1638592
Moradi, M., & Ghadiri, N. (2017). Quantifying the informativeness for biomedical literature summarization: An itemset mining method. Computer Methods and Programs in Biomedicine, 146, 77–89. https://doi.org/10.1016/j.cmpb.2017.05.011
Moradi, M., & Ghadiri, N. (2019). Text summarization in the biomedical domain. ArXiv, 1–12.
Morales, L. P., Esteban, A. D., & Gervás, P. (2008). Concept-Graph Based Biomedical Automatic Summarization Using Ontologies. Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, 53–56. USA: Association for Computational Linguistics.
O’Connor, P. J., Sperl-Hillen, J. A. M., Rush, W. A., Johnson, P. E., Amundson, G. H., Asche, S. E., … Gilmer, T. P. (2011). Impact of electronic health record clinical decision support on diabetes care: A randomized trial. Annals of Family Medicine, 9(1), 12–21. https://doi.org/10.1370/afm.1196
Patwardhan, S, & Pedersen, T. (2006). Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. Proceedings of the EACL 2006 Workshop Making Sense of Sense - Bringing Computational Linguistics and Psycholinguistics Together, 1501, 1–8. Trento, Italy.
Patwardhan, Siddharth, & Pedersen, T. (2006). Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, 1501, 1–8. https://doi.org/citeulike-article-id:1574418
Pivovarov, R., & Elhadad, N. (2015). Automated methods for the summarization of electronic health records. Journal of the American Medical Informatics Association, 22(5), 938–947. https://doi.org/10.1093/jamia/ocv032
Plaza, L., Stevenson, M., & Díaz, A. (2010). Improving Summarization of Biomedical Documents Using Word Sense Disambiguation. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 55–63. USA: Association for Computational Linguistics.
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Çelebi, A., Dimitrov, S., … Zhang, Z. (2004). MEAD-a platform for multidocument multilingual text summarization. Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004, 699–702.
Reeve, L. H., Han, H., & Brooks, A. D. (2007). The use of domain-specific concepts in biomedical text summarization. Information Processing & Management, 43(6), 1765–1776. https://doi.org/https://doi.org/10.1016/j.ipm.2007.01.026
Reeve, L. H., Han, H., Nagori, S. V., Yang, J. C., Schwimmer, T. A., & Brooks, A. D. (2006). Concept frequency distribution in biomedical text summarization. International Conference on Information and Knowledge Management, Proceedings, 604–611. https://doi.org/10.1145/1183614.1183701
Reeve, L., Han, H., & Brooks, A. D. (2006). BioChain: Lexical chaining methods for biomedical text summarization. Proceedings of the ACM Symposium on Applied Computing, 1, 180–184.
Reeves, J. J., Hollandsworth, H. M., Torriani, F. J., Taplitz, R., Abeles, S., Tai-Seale, M., … Longhurst, C. A. (2020). Rapid response to COVID-19: Health informatics support for outbreak management in an academic health system. Journal of the American Medical Informatics Association, 27(6), 853–859. https://doi.org/10.1093/jamia/ocaa037
Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 1, 448–453. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Sánchez, D., Batet, M., Isern, D., & Valls, A. (2012). Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728. https://doi.org/10.1016/j.eswa.2012.01.082
Sarkar, K. (2009). Using Domain Knowledge for Text Summarization in Medical Domain. International Journal of Recent Trends in Engineering, 1.
Sarkar, K., Nasipuri, M., & Ghose, S. (2011). Using Machine Learning for Medical Document Summarization. International Journal of Database Theory and Application International Journal of Database Theory and Application, 4(1), 31–48.
Senteio, C., Veinot, T., Adler-Milstein, J., & Richardson, C. (2018). Physicians’ perceptions of the impact of the EHR on the collection and retrieval of psychosocial information in outpatient diabetes care. International Journal of Medical Informatics, 113(January), 9–16. https://doi.org/10.1016/j.ijmedinf.2018.02.003
Sharma, M., & Aggarwal, H. (2016). EHR Adoption in India: Potential and the Challenges. Indian Journal of Science and Technology, 9(34). https://doi.org/10.17485/ijst/2016/v9i34/100211
Shoolin, J., Ozeran, L., Hamann, C., & Bria, W. 2nd. (2013). Association of Medical Directors of Information Systems consensus on inpatient electronic health record documentation. Applied Clinical Informatics, 4(2), 293–303. https://doi.org/10.4338/ACI-2013-02-R-0012
Stone, C. P. (2014). A Glimpse at EHR Implementation Around the World: The Lessons the US Can Learn. Institute for E-Health Policy/HIMSS Foundation, (May), 1–12. Retrieved from http://www.e-healthpolicy.org/docs/A_Glimpse_at_EHR_Implementation_Around_the_World1_ChrisStone.pdf
Wang, Y., & Fang, H. (2016). Extracting Useful Information from Clinical Notes. 1–5.
Wen, H.-C., Chang, W.-P., Hsu, M.-H., Ho, C.-H., & Chu, C.-M. (2019). An Assessment of the Interoperability of Electronic Health Record Exchanges Among Hospitals and Clinics in Taiwan. JMIR Medical Informatics, 7(1), e12630. https://doi.org/10.2196/12630
Wrenn, J. O., Stein, D. M., Bakken, S., & Stetson, P. D. (2010a). Quantifying clinical narrative redundancy in an electronic health record. Journal of the American Medical Informatics Association : JAMIA, 17(1), 49–53. https://doi.org/10.1197/jamia.M3390
Wrenn, J. O., Stein, D. M., Bakken, S., & Stetson, P. D. (2010b). Quantifying clinical narrative redundancy in an electronic health record. Journal of the American Medical Informatics Association, 17(1), 49–53. https://doi.org/10.1197/jamia.M3390
Yadav, P., Steinbach, M., Kumar, V., & Simon, G. (2017). Mining electronic health records (EHR): A survey. ArXiv, 50(6), 1–40.
Zhang, R., Pakhomov, S., McInnes, B. T., & Melton, G. B. (2011). Evaluating measures of redundancy in clinical texts. AMIA ... Annual Symposium Proceedings. AMIA Symposium, 2011, 1612–1620. Retrieved from https://pubmed.ncbi.nlm.nih.gov/22195227
Zhang, R., Pakhomov, S., & Melton, G. B. (2012). Automated identification of relevant new information in clinical narrative. IHI’12 - Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, 837–841. https://doi.org/10.1145/2110363.2110467
Zhang, R., Pakhomov, S., & Melton, G. B. (2014). Longitudinal analysis of new information types in clinical notes. AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science, 2014, 232–237. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/25717418%0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4333708
Zhang, R., Pakhomov, S. V. S., Arsoniadis, E. G., Lee, J. T., Wang, Y., & Melton, G. B. (2017). Detecting clinically relevant new information in clinical notes across specialties and settings. BMC Medical Informatics and Decision Making, 17(Suppl 2). https://doi.org/10.1186/s12911-017-0464-y
Zhang, R., Pakhomov, S. V, Lee, J. T., & Melton, G. B. (2014). Using language models to identify relevant new information in inpatient clinical notes. AMIA ... Annual Symposium Proceedings. AMIA Symposium, 2014, 1268–1276. Retrieved from https://pubmed.ncbi.nlm.nih.gov/25954438

指導教授

胡雅涵(Ya-Han Hu)

審核日期

2021-7-20

推文