中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/89278
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41246794      線上人數 : 568
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89278


    題名: 利用自然語言處理在胸腔X-Ray的自由文本病歷報告中標記識別心臟肥大的檔案和句子;Using Sentence Bidirectional Encoder Representations from Transformers (SBERT) for Identification of Chest X-Ray Cardiomegaly Cases and Phrases from Free-text Reports
    作者: 黃政清;Huang, Cheng-Ching
    貢獻者: 生物醫學工程研究所
    關鍵詞: 自然語言處理;電子健康紀錄;健康資訊學;命名實體識別;遷移學習;變換神經網路;Natural Language Processing (NLP);Electronic Health Records (EHR);Health Informatics, Named Entity Recognition;Transfer Learning;Transformer
    日期: 2022-08-17
    上傳時間: 2022-10-04 11:08:39 (UTC+8)
    出版者: 國立中央大學
    摘要: 醫療病例的診斷紀錄具有非常高的臨床應用價值,其中不只有病患身上的情況與資 訊,更有醫師針對這些資訊而產生的判斷,如果能善用這些病例文檔來擷取電腦輔助醫 療資訊整合管理系統,將可以減少小診所以及大醫院的人力成本、時間成本,可減少醫 師的行政業務負擔,使更多的資源用於病患身上。 開發全自動化的醫療病例的診斷紀錄的軟體分析系統,這對目前大數據處理技術而 言是一項困難的任務。現有的病例報告大多都以非結構化格式的編寫,文字檔案中包含 很多技術註釋與速記的術語,內容也常帶有排除、疑似的用語,使得具醫療專業知識背 景的專業人士需要將整份報告完整閱讀才能下判斷,對於非醫療領域的人就更加的困難。 自然語言處理(Natural language processing)是現今文檔自動分類應用的熱門話 題,BERT 是目前最先進的技術,可以使文字擁有更好的詞嵌入參數以方便計算文檔之間 的語意距離,而 SBERT 是在 BERT 的架構之上的延伸模組,可以進行句子與句子之間的 比較,判斷是否為同類。我們訓練 SBERT 模組在整個病例文檔之中切割出與疾病診斷相 關的關鍵句子,給予病歷之中的每個句子一項疾病診斷相關性的分數,方便使用者快速 找出關鍵句。 在這個研究中,我們使用美國國家醫學圖書館的主題詞(MeSH)數據集(3,955 帶標記 的病歷報告) 訓練 SBERT 分類器模型做病例報告的正常分類,我們將 MeSH 按照 7:2:1 分成了訓練集、驗證集、測試集,進行訓練模型的資料有 2768(70%)筆(975 筆正常、1793 筆異常)。驗證集部分有 783(19.79%)筆資料(276 筆正常、507 筆異常),測試資料有 404 (10.21%)筆文檔(142 筆正常、262 筆異常)。實驗測試得到了 97.2% (393/404)(陽性預 測值 95.8%,137/143)(陰性預測值 98.1%,256/261)的正常病例報告分辨正確率。把所 有 1393 筆正常報告的 6150 句子測試,正常句子的測試正確率為 96.8%(5951/6150)。我 們發現錯判的 199 個句子中,有一大部分錯誤都與 inflate 這個字相關,因此,我們新 ii 增了 100 筆擁有 inflate 相關的短句文檔,並加入重新訓練,得到了正常病例文檔的分 辨正確率 96.8 % (391/404)(陽性預測值 95.7%,135/141)(陰性預測值 97.3 %, 256/263) ,但是在句子測試中則進步到 98.0 % (6029/6150). 我們還進行了心臟肥大的文檔分辨訓練,使用同一筆訓練資料的 2768(70%)筆資料 (250 筆心臟肥大、2518 筆非心臟肥大)進行模型的訓練,以驗證資料 783 (89 筆心臟肥 大、694 筆非心臟肥大)找最佳的訓練模型參數,以測試資料 404 中的 262 筆異常文檔 (36 筆心臟肥大、226 筆異常非心臟肥大)測試,正確率 100%。 與 NegBio 進行比較,正常方面 NegBio 的正確率 69.1 % (279/404)(陽性預測值 53.8%,121/225)(陰性預測值 88.3%,158/179)。在心臟肥大方面,NegBio 的正確率 94.1% (380/404)(陽性預測值 60.3 %,35/58)(陰性預測值 99.7%,345/346)。在正常和心臟 肥大的分類上面,SBERT 都勝過了 NegBio 的正確率。;The diagnostic records of medical cases are of very high research and clinical values. They include not only the information regarding the patients’ conditions, but also the judgments made by doctors based on their domain knowledge and expertise. If these diagnostic files are well used and implemented in computer-aided monitoring systems, the manpower needed for quality management of small clinics and large hospitals will be reduced. The reduced management burden will allow physicians devote more time and resources to taking care of patients. It is a difficult task to process medical records using traditional rule-based national language processing (NLP) technologies. Most of the existing clinical reports are written in an unstructured format. Further, there are also a lot of technical notes and shorthands in these records. The content often contains excluding and suspecting terms, which usually require the medical professional knowledge background to be able to understand fully. Medical doctors generally need to read the entire report to make judgments and the high level of medical knowledge makes the task extremely difficult for people from non-medical fields. To tackle this challenging task, we propose using newly developed deep convolutional NLP models to help bridging non-medical information scientists to derive information from annotated clinical records. BERT is the most advanced technology at present, which can create better embedding vectors for measuring free-text document similarity, and Sentence BERT (SBERT) is a collection of pretrained models which allow users to adopt the BERT models by transferring learning. In this study, we train SBERT to classify between normal and abnormal phases and sentences. A separated model was also trained to find cardiomegaly sentences which can be assigned with highlight colors and scores for fast grasping at a glance. In this study, we used the Medical Subject Headings (MeSH) dataset (3,955 labeled medical record reports) for training the SBERT classifier Model. The data was split into 2768 iv (70 %) records (975 normal, 1,793 abnormal) for training, 783 (19.79 %) records (276 normal, 507 abnormal) for validation, and, 404 (10.21%) documents (142 normal, 262 abnormal) for test. The resultant accuracy rate in document classification was 97.2% (393/404) (positive predictive value (PPV) 95.8 %, 137/143) (negative predictive value (NPV) 98.1 %, 256/261) and the sentence classification accuracy rate was 96.8% (5951/6150). By adding 100 mis classified sentences to the training dataset, we improved sentence classification accuracy rate to 98.0 % (6029/6150) with a small reduction of document classification accuracy to 96.8% (391/404) (PPV 95.7 %, 135/14) (NPV 97.3 %, 256/263), which was much better than the NIH NegBio′s accuracy rate of 69.1 % (PPV 53.8 %, 121/225) (NPV 88.3 %, 158/179) by a large margin. In classifying documents with cardiomegaly, SBERT achieved an accuracy rate of 100 % for the test dataset (36 cardiomegaly and 226 non-cardiomegaly abnormal cases). NegBio’s test accuracy rate was 94.1 % (380/404) (PPV 60.3 %, 35/58) (NPV 99.7 %,345/346).
    顯示於類別:[Institute of Biomedical Engineering] Electronic Thesis & Dissertation

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML58檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明