針對病歷之疾病命名實體標註以及醫院科別病歷轉移學習之分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.145.38.67

姓名

黃珏倪(Jue-Ni Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

針對病歷之疾病命名實體標註以及醫院科別病歷轉移學習之分析
(Disease NER of Medical Records and Analysis of Transfer Learning of Medical Records between Different Hospital Departments)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究	★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索	★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現	★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法
★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結	★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data
★ 藉由加入多重語音辨識結果來改善對話狀態追蹤	★ 對話系統應用於中文線上客服助理:以電信領域為例
★ 應用遞歸神經網路於適當的時機回答問題	★ 使用多任務學習改善使用者意圖分類
★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法	★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著自然語言處理相關技術的快速發展，其在跨領域的應用上也有相當的發展。生醫文本探勘是生醫領域相關研究中重要的目的之一，隨著相較於以前的紙本記錄更趨向電子化的紀錄方式，在生醫文本探勘中也提供更多的資源去做研究。我們以醫院病歷作為研究方向，針對不同醫院科別間的病歷轉移學習作為主要目的。要達到這項目標，我們會使用到生醫領域的命名實體標註技術(Named Entity Recognition)，藉由其預測出在病歷中的疾病名稱，使醫療人員在統整記錄診斷時能有相當的幫助。過去的研究中，大致上分類為基於規則的生醫文本命名實體標註以及基於字典的命名實體標註兩大方向。但這兩者共同的缺點為會有文字的歧異性，並不能良好的區分語意問題。
為了解決這樣的問題，我們使用機器學習的方法，BioBERT(Bidirectional Encoder Representations from Transformers for Biomedical Text Mining)則是在生醫自然語言處理領域中相當重要的技術之一。在我們的實驗中，我們將以醫院的科別為單位去做病歷文本探勘，以分析在不同科的病歷所訓練出的模型轉移學習到其他科別時的效果與不同科之間的文本差異。

摘要(英)

With the rapid development of natural language processing (NLP), there has been considerable development in their cross-domain applications. Biomedical text mining is one of the most important purposes in biomedical research, and with the move towards electronic records as opposed to paper records, it provides more resources for biomedical text mining. We use hospital medical records as the research data source, and the primary objective is to apply the transfer learning of medical records between different hospital departments. To achieve this goal, we use Named Entity Recognition (NER), a technique used in the biomedical field that predicts the name of a disease
in the patient’s record, to help medical experts in the consolidation of diagnoses. In the past studies, the two main approaches are roughly classified as rule-based biomedical text NER and dictionary-based NER. However, their common disadvantage is textually ambiguous, which is not the best way to distinguish semantic problems.
BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) is one of the most important technologies in the field of biomedical natural language processing, and we use machine learning to solve ambiguous word problems. In our experiment, We will apply text mining on medical records through hospital departments in order to analyze the effect of transferring the model trained in medical records from different departments to other departments and the differences in text between different departments.

關鍵字(中)

★ 生醫文獻探勘
★ 機器學習
★ 自然語言處理
★ 轉移學習
★ 疾病命名實體標註

關鍵字(英)

★ Biomedical text mining
★ Machine learning
★ Natural language processing
★ Transfer learning
★ Disease named entity recognition

論文目次

中文摘要..................................................................................................... i
英文摘要..................................................................................................... ii
謝誌............................................................................................................. iii
Contents ...................................................................................................... iv
List of Figures ............................................................................................. v
List of Tables............................................................................................... vi
Chapter 1 Introduction ................................................................. 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Description . . . . . . . . . . . . . . . . . . . 2
Chapter 2 Related Work............................................................... 4
2.1 Named Entity Recognition . . . . . . . . . . . . . . . . 4
Chapter 3 Methodology ............................................................... 6
3.1 Data Annotation . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Data Annotation for Machine Learning . . . . . . . . . . 6
3.1.2 Entity Annotation . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Sensitivity and Specificity . . . . . . . . . . . . . . . . . 7
3.2 System Flow . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 4 Experiment .................................................................. 12
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Experimental Settings and Results . . . . . . . . . . . . 15
Chapter 5 Discussion ................................................................... 25
Chapter 6 Conclusion................................................................... 33
Reference .................................................................................................... 35

參考文獻

[1] Fei Zhu, Preecha Patumcharoenpol, Cheng Zhang, Yang Yang, Jonathan Chan, Asawin Meechai, Wanwipa Vongsangnak, and Bairong Shen. Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics, 46(2):200–211, 2013.
[2] Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. Mimic-iii, a freely accessible critical care database. Scientific Data, 3(1):160035, 2016.
[3] Do˘gan RI, Leaman R, and Lu Z. Ncbi disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform, 47:1–10, 2014.
[4] Ulf Leser and J¨org Hakenberg. What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics, 6(4):357–369, 2005.
[5] D. Nadeau and S. Sekine. A survey of named entity recognition and classication. Lingvisticae Investigationes, 30(1):3–26, 2007.
[6] Aaron M. Cohen and William R. Hersh. A survey of current work in biomedical text mining. Briefings in Bioinformatics, 6(1):57–71, 2005.
[7] Dietrich Rebholz-Schuhmann, Antonio Jimeno Yepes, Chen Li, Senay Kafkas, Ian Lewin, Ning Kang, Peter Corbett, David Milward, Ekaterina Buyko, Elena Beisswanger, Kerstin Hornbostel, Alexandre Kouznetsov, Ren´e Witte, Jonas B. Laurila, Christopher J. O. Baker,
Cheng-Ju Kuo, Simone Clematide, Fabio Rinaldi, Rich´ard Farkas, Gy¨orgy M´ora, Kazuo Hara, Laura I. Furlong, Michael Rautschka, Mariana Lara Neves, Alberto Pascual-Montano, Qi Wei, Nigel Collier, Md Faisal Mahbub Chowdhury, Alberto Lavelli, Rafael Berlanga, Roser Morante, Vincent Van Asch, Walter Daelemans, Jos´e Lu´ıs Marina, Erik van Mulligen, Jan Kors, and Udo Hahn. Assessment of ner solutions against the first and second calbc silver standard corpus. Journal of Biomedical Semantics, 2(5):S11, 2011.
[8] Carol Friedman, Philip O. Alderson, John H. M. Austin, James J. Cimino, and Stephen B. Johnson. A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics
Association, 1(2):161–174, 1994.
[9] T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter. Edgar: extraction of drugs, genes and relations from the biomedical literature. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pages 517–528, 2000.
[10] Tanabe L and Wilbur WJ. Tagging gene and protein names in biomedical text. Bioinformatics, 18(8):1124–1132, 2002.
[11] Aronson AR. Effective mapping of biomedical text to the umls metathesaurus: the metamap program. Proc AMIA Symp, pages 17–21, 2001.
[12] Cyril Goutte and Eric Gaussier. A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. Advances in Information Retrieval, pages 345–359. Springer Berlin Heidelberg.
[13] X. Zhu and A.B. Goldberg. Introduction to semi-supervised learning. Morgan and Claypool Publishers, 2009.
[14] Abdul Ghaaliq Lalkhen and Anthony McCluskey. Clinical tests: sensitivity and specificity. Continuing Education in Anaesthesia Critical Care Pain, 8(6):221–223, 2008.
[15] D. G. Altman and J. M. Bland. Diagnostic tests. 1: Sensitivity and specificity. BMJ (Clinical research ed.), 308(6943):1552–1552, 1994.
[16] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2019.
[17] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv e-prints, page arXiv:1810.04805, October 2018.
[18] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations ofWords and Phrases and their Compositionality. arXiv e-prints, page arXiv:1310.4546, October 2013.
[19] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. arXiv e-prints, page arXiv:1706.03762, June 2017.
[20] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, WeiWang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol
Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv e-prints, page arXiv:1609.08144, September 2016.
[21] Giorgi JM and Bader GD. Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics, 34(23):4087–4094, 2018.
[22] Baoli Li and Liping Han. Distance weighted cosine similarity measure for text classification. In Hujun Yin, Ke Tang, Yang Gao, Frank Klawonn, Minho Lee, Thomas Weise, Bin Li, and Xin Yao, editors, Intelligent Data Engineering and Automated Learning – IDEAL 2013, pages 611–618, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
[23] Yin Zhang, Rong Jin, and Zhi-Hua Zhou. Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1:43–52, 12 2010.
[24] Sandeep Tata and Jignesh Patel. Estimating the selectivity of tf-idf based cosine similarity predicates. Sigmod Record, 36, 06 2007.
[25] Ivan Dokmanic, Reza Parhizkar, Juri Ranieri, and Martin Vetterli. Euclidean Distance Matrices: Essential theory, algorithms, and applications. IEEE Signal Processing Magazine, 32(6):12–30, November 2015.
[26] Kimberly A. Lochner and Christine S. Cox. Prevalence of multiple chronic conditions among medicare beneficiaries, united states, 2010. Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA., 2013.
[27] Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. arXiv e-prints, page arXiv:1705.02364, May 2017.
[28] Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Lyn Untalan Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-C´espedes, Steve Yuan, Chris Tar, Yun hsuan Sung, Brian Strope, and Ray Kurzweil. Universal sentence encoder. In In submission to:
EMNLP demonstration, Brussels, Belgium, 2018. In submission.

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2020-8-12

推文