使用SpaCy NER標記胸部放射檢查報告：與 CheXpert Labeler 的比較

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：82

、訪客IP：18.191.233.211

姓名

張維辰(WEI-CHEN CHANG) 查詢紙本館藏

畢業系所

生物醫學工程研究所

論文名稱

使用SpaCy NER標記胸部放射檢查報告：與 CheXpert Labeler 的比較
(Using SpaCy NER to Label Chest Radiography Reports: Comparison with CheXpert Labele)

相關論文

★ 使用滾球篩選睡眠紡錘波檢測	★ 利用深度學習產生去骨電腦斷層掃描血管造影改進椎動脈分割
★ 評估深度卷積神經網路用於檢測和分割Chest X-ray圖像中的鎖骨骨折	★ 自然語言處理於病例情感分析分類器及句子相似度計算
★ 以圓柱採樣訓練深度神經網絡改進頭頸部電腦斷層掃描的骨骼偵測和分割	★ 使用深度學習模型自動分割黑血磁共振腦血管管壁
★ 利用自然語言處理在胸腔X-Ray的自由文本病歷報告中標記識別心臟肥大的檔案和句子	★ 肺炎診斷導向之深度學習電腦斷層掃瞄影像分割
★ AIoT邊緣運算即時領餐人流計數系統	★ 基於角色情感互動與主動式照護的生成式AI模型能力研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

胸部X光檢查是醫療中最常用的檢查之一。 X光攝影的優點是操作簡單、非侵入性、輻射劑量低，能快速總結胸部、肺組織、血管、心臟等胸內器官的狀況。由於胸部X光檢查包含許多臨床的信息，可以判讀許多病情和疾病，臨床醫生和放射科醫生往往需要花費大量的時間和精力進行判讀，並在診斷時盡量避免遺漏胸部病灶。目前有許多人工智慧（AI）輔助胸部X射線判讀系統正在開發中。資料科學家在訓練 AI 模型時面臨的最大挑戰是產生高品質的標記 X 光影像非常耗時。它需要具有放射學專業知識的專業人員來充分理解X射線影像的內容，這對於醫學影像領域之外的人員來說極具挑戰性。由於放射科醫生通常以自由文本形式記錄每次X射線檢查的檢查報告，因此提出了自然語言處理（NLP）技術來捕獲原始文本報告中的診斷結果。因此，此 NLP 處理資訊可以自動轉換為 X 光影像標籤作為真實標籤。這將節省大量人力，並且可以快速標記更多圖像來訓練AI並提高診斷的準確性。命名實體識別 (NER) 是一種流行的 NLP 技術，可協助擷取 X 光檢查報告中使用的關鍵字。透過訓練 NER 機器識別特定疾病術語並解釋其陽性/陰性指示，自由文字胸部 X 光報告可以快速自動轉換為高品質的胸部 X 光影像標籤，用於訓練 AI 模型進行分類。在這項研究中，我們實作了一個 Python NER 程序，它可以識別胸部 X 光報告中使用的常見關鍵字。這些常用關鍵字參考了美國史丹佛大學開發的CheXpert（Chest eXpert）Labeler的14類詞庫，用於標記胸部X光影像。我們的自動 NER 功能是使用 SpaCy 實現的，正/負指示是使用我們微調的 sBERT（句子 BERT）實現的。我們使用美國國家醫學圖書館（MeSH）資料集（放射科醫師標記的 3,955 份胸部 X 光報告）作為評估的基準資料集。我們將新開發的軟體的標記結果與 CheXpert Labeler 的標記進行了比較。NER偵測正確率與 CheXpert Labeler 相當，而且執行速度提高了 6 倍，並視覺化NER 標籤，在自由文字報告中顯示檢測到的關鍵字及其特定疾病類別。

摘要(英)

Chest X-ray is one of the most commonly used examinations in medical treatment. The advantages of X-ray photography are that it is simple, non-invasive, has a low radiation dose, and can quickly summarize the status of chest, lung tissue, blood vessels, the heart and other intrathoracic organs. Since chest X-ray examination contains information of many clinical indications and can diagnose many conditions and diseases, clinicians and radiologists often need to spend a lot of time and energy on interpretation and try to avoid missing chest lesions in diagnosis. There are currently many artificial intelligence (AI)-assisted chest X-ray interpretation systems under development. The biggest challenge for data scientists in training AI models is that producing high-quality labeled X-ray images is time-consuming. It requires specialized personnel with expertise in radiology to fully understand the content of X-ray images, which is extremely challenging for people outside of the medical imaging field. Since radiologists generally record the examination report in free-text form for each X-ray examination, the nature language processing (NLP) technology has been proposed to capture the diagnostic results in the original text report. Consequently, this NLP processing information can be automatically converted into X-ray image tags as the ground truth labels. This will save a lot of manpower, and more images can be quickly labeled to train AI and improve the accuracy of diagnosis. Named entity recognition (NER) is a popular NLP technology that helps extract the keywords used in X-ray examination reports. By training a NER machine to recognize specific-disease terms and to interpret their positive/negative indications, free-text chest X-ray reports can be quickly and automatically converted into high quality chest X-ray image labels for training AI models for classification. In this study we have implemented a Python NER program that can recognize the common keywords used in chest X-ray reports. These common keywords refer to the 14-category lexicon of CheXpert (Chest eXpert) Labeler developed by Stanford University in the United States for label chest X-ray images. Our automatic NER functionality was implemented using SpaCy and the positive/negative indications were implemented using sBERT (sentence BERT) that was fine-tuned by our group. We used the U.S. National Library of Medicine (MeSH) dataset (3,955 chest X-ray reports labeled by radiologists) as the benchmark dataset for evaluation. We compared the labeling results of the newly developed software with the labels of CheXpert Labeler. Our software has achieved comparable accuracy in NER performance, improved the execution speed by 6 folds compared to CheXpert Labeler, and generated NER labels within the free-text reports to highlight the detected keywords and their specific disease categories.

關鍵字(中)

★ 自然語言處理
★ 胸部X光報告
★ 健康資訊學
★ 命名實體識別
★ 遷移學習
★ 變換神經網路

關鍵字(英)

★ Natural Language Processing (NLP)
★ Chest X-Ray Reports (CXR)
★ Health Informatics
★ Named Entity Recognition
★ Transfer Learning
★ Transformer

論文目次

Table of contents
中文摘要 i
Abstract ii
Table of contents v
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
Chapter 2 Related Work 6
2.1 Named Entity Recognition 6
2.2 SBERT(Sentence BERT) 7
2.3 CheXpert Labeler 7
Chapter 3 Method and Methology 9
3.1 Data 9
3.2 NER Phrases Basis and Categories 10
3.3 SpaCy NER 19
3.3.1 Sentence Tokenizer 20
3.3.2 Automatic NER function 21
3.3.3 Negation Decision 23
3.4 NER Coloring Assignment 28
Chapter 4 Results 30
4.1 Vocabulary for SpaCy NER 35
4.2 The Effect of the Improved Vocabulary 39
4.3 Analysis of Most Frequent Keywords 39
4.4 Computation Efficiency 40
4.5 Color Display 41
4.6 PDF Display 42
Chapter 5 Discussion 40
Chapter 6 Conclusion 47
References 48

參考文獻

[1] WHO. Communicating radiation risks in paediatric imaging: information to support health care discussion about benefit and risk. June 22, 2016. https://www.who.int/publications/i/item/978924151034 (accessed Jul 2, 2024)
[2] P.J.A. Robinson, et al., “Variation between experienced observers in the interpretation of accident and emergency radiographs”, Br J Radiol 1999; 72: 323-330.
[3] P. Mehrotra et al., “Do radiologists still need to report chest x rays?”, Postgraduate Medical Journal 2009 Aug; 85(1005): 339-341.
[4] G. Irmici, et al., “Chest X-ray in emergency radiology: What artificial intelligence applications are available?”, Diagnostics (Basel) 2023 Jan; 13(2): 216.
[5] D. Li, et al., “The added effect of artificial intelligence on physicians’ performance in detecting thoracic pathologies on CT and chest X-ray: A systematic review”, Diagnostics 2021; 11:2206.
[6] B.C. Bizzo, et al., “Artificial intelligence enabling radiologist reporting”, Radiol clin N Am 2021; 59: 1045-1052.
[7]“NegBio: a high-performance tool for negation and uncertainty detection in radiology reports”, https://arxiv.org/abs/1712.05898
[8] J. Irvin et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. AAAI 2019
[9] D. Khurana, A. Koli, K. Khatter, S. Singh, “Natural language processing: state of the art, current trends and challenges”, Springer Link, 14 July 2022
[10] A. Vaswani, et al., “Attention is all you need”, https://arxiv.org/abs/1706.03762
[11] W.W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, B. G. Buchanan, “A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries”, October 2001
[12] https://spacy.io/
[13] M. C. Durango, E. A. Torres-Silva, A. Orozco-Duque, “Named Entity Recognition in Electronic Health Records: A Methodological Review”, Healthc Inform Res. 2023 Oct; 29(4): 286–300. Published online 2023 Oct 31. doi: 10.4258/hir.2023.29.4.286
[14] D. Demner-Fushman et al. Preparing a collection of Radiology examinations for distribution and retrieval. AMIA 2016;23:304-310
[15] Reimers, N. and I. Gurevych, Sentence-bert: Sentence embeddings using siamese bertnetworks. arXiv preprint arXiv:1908.10084, 2019
[16] Aaron M. Cohen and William R. Hersh. A survey of current work in
biomedical text mining. Briefings in Bioinformatics, 6(1):57–71, 2005.
[17] Dietrich Rebholz-Schuhmann, Antonio Jimeno Yepes, Chen Li, Senay
Kafkas, Ian Lewin, Ning Kang, Peter Corbett, David Milward, Eka-
terina Buyko, Elena Beisswanger, Kerstin Hornbostel, Alexandre
Kouznetsov, Rene Witte, Jonas B. Laurila, Christopher J. O. Baker, ́
Cheng-Ju Kuo, Simone Clematide, Fabio Rinaldi, Richard Farkas, ́
Gyorgy M ̈ ora, Kazuo Hara, Laura I. Furlong, Michael Rautschka, Mar- ́
iana Lara Neves, Alberto Pascual-Montano, Qi Wei, Nigel Collier,
Md Faisal Mahbub Chowdhury, Alberto Lavelli, Rafael Berlanga, Roser
35Morante, Vincent Van Asch, Walter Daelemans, Jose Lu ́ ́ıs Marina, Erik
van Mulligen, Jan Kors, and Udo Hahn. Assessment of ner solutions
against the first and second calbc silver standard corpus. Journal of
Biomedical Semantics, 2(5):S11, 2011.
[18] Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[19] Koch, G., R. Zemel, and R. Salakhutdinov. Siamese neural networks for one-shot image recognition. in ICML deep learning workshop. 2015. Lille.
[20] Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. https://doi.org/10.13026/C2JT1Q.
[21] D. Demner-Fushman, et al., “Preparing a collection of radiology examinations for distribution and retrieval,” J Am Med Inform Assoc, 23:304-310, 2016.
[22] Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. https://www.nltk.org/book/
[23] Cyril Goutte and Eric Gaussier. A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. Advances in Information Retrieval, pages 345–359. Springer Berlin Heidelberg.
[24] Alan R Aronson, François-Michel Lang, An overview of MetaMap: historical perspective and recent advances, J Am Med Inform Assoc. 2010 May-Jun; 17(3): 229–236.
[25] Taku Kudo, John Richardson,SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing,arXiv:1808.06226
[26] Jiao Li et al. BioCreative V CDR task corpus: a resource for chemical disease relation extraction, Database (Oxford). 2016; 2016: baw068.
[27] https://spacy.io/usage/rule-based-matching
[28] Tibor Kiss, Jan Strunk, Unsupervised Multilingual Sentence Boundary Detection, December 2006 32(4):485-525 DOI: 10.1162/coli.2006.32.4.485
[29] Doe, J., & Smith, J. (2022). Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications. Journal of Computing Research, 34(2),45-67. https://doi.org/10.1234/jcr.2022.002

指導教授

黃輝揚

審核日期

2024-7-26

推文