基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務;ICD-10 Medical Coding for Discharge Summaries Experiment with Training Target Modification Based on Primary Diagnosis

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/92984

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92984

題名:	基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務;ICD-10 Medical Coding for Discharge Summaries Experiment with Training Target Modification Based on Primary Diagnosis
作者:	李仲剛;Lee, Jong-Kang
貢獻者:	資訊工程學系
關鍵詞:	疾病分類;電子病歷;自然語言處理;機器學習;medical coding;electronic health record;natural language processing;machine learning
日期:	2023-02-01
上傳時間:	2024-09-19 16:36:42 (UTC+8)
出版者:	國立中央大學
摘要:	疾病分類任務是將電子病歷分類至疾病分類編碼的過程。現今的疾病分類任務通常由醫生完成並由疾病分類師檢查。他們需要詳閱住院紀錄並準確標註出疾病種類。這個過程需要耗時費力。因此也有許多研究致力於自動疾病分類系統。自動疾病分類系統面臨著兩個難處：第一個是長文本。臨床病歷大多不是結構化資料，通常由醫師或看護者撰寫的自由文本。病歷的文本長度經常超過預訓練語言最大輸入限制，導致無法直接使用預訓練語言模型。第二個問題是嚴重的分類不平衡問題。我們與聯新醫院合作，使用醫院的出院摘要記錄作為我們的資料集。我們主要進行了兩種模型實驗，第一種是一般的預訓練語言模型，第二種是使用 PLM-ICD 模型與其改進方法。PLM-ICD 是目前在 MIMIC-III 表現最好的疾病分類系統。我們第一個貢獻是使用到主診斷的資訊，並於訓練時修改其他診斷地訓練權重。主診斷代碼是導致入院的主要原因，也是電子病歷中出現的第一個醫療代碼，這是其他疾病分類系統沒有注意到的一件事。我們修改主診斷以外診斷的訓練權重，並實驗證實了這樣能夠提升主診斷的準確度：原始 PLM-ICD 用的方法主診斷的準確率是 0.2736，修改權重後能夠提升為 0.4493。另外 F1-Micro 的分數是 0.5212。我們也評估了如何使用的不同欄位資料以達到最佳的效果。聯新醫院提供的資料集被分成很多個欄位。這是 MIMIC-III 資料集沒有的特性。我們根據醫生的專業知識選擇了以下七個欄位進行實驗：主訴、病史、特殊檢查、醫療影像檢查、病理報告、手術日期及方法、住院治療經過。我們實驗哪些欄位應該被用作訓練輸入文本以發揮最佳效果。我們將這七個欄位視為特徵，並使用向前特徵選擇法一一選擇。因為我們已經證明列的使用順序會影響性能，所以我們選擇了七列作為測試結果。最後的結果是按照”病史+手術日期及方法+住院治療經過+病理報告+主訴+醫療影像檢查+特診檢查的”順序能夠有最好的表現。這樣的順序能夠將 F1-micro 提升為 0.5409，主診斷準確率為 0.2751 . 最後，我們結合修改權重和最佳欄位順序兩項實驗，達到 F1-micro 為 0.5415 和主診斷準確率為 0.4569 的初級診斷代碼分數。雖然這些實驗只用於聯新資料集中得到驗證，但它可以鼓勵研究人員更仔細的處理疾病分類問題 ICD 並探討更深入的研究。;Medical coding is the process of assigning medical codes to electronic health records (EHRs). The most widely used medical code set is the ICD (International Classification of Disease) code, so medical coding is usually referred to as ICD coding. ICD coding is a practical and essential need for hospitals around the world, but medical coding is usually done by physicians and checked by clinical coders (nosologists). They need to read the whole health record in detail and assign every clinical code to the health record, the process requires significant human resources and is time-consuming. A medical coding problem can be viewed as a multi-label text classification problem using NLP (natural language processing). Recent automatic coding systems have faced two common challenges. The first is long free text. The clinical health record is mostly in free-text form, and the length of text will exceed the modern pre-trained language maximum input limit, which will lose the opportunity to use the understanding of pre-trained language models (PLMs). The second problem is the huge label set and label imbalance. There are around 70000 ICD codes, which vary between countries. The two traits make the model harder to train, and lead to lower performance. For this study, we have the good fortune to collaborate with Landseed hospital and to use the hospital′s discharge summary record as our dataset. We compare the performances of vanilla multi-label classification with different methods, the first maximum-length tokens, the first/last maximum-length tokens, and the slice-and-merge method. Furthermore, we continue to use the state-of-the-art method on the MIMIC-III dataset, the PLM-ICD system. PLM-ICD slices the clinical records into multiple sequences, not exceeding the maximum token limit of PLMs, and concatenates the embeddings, which solves the problem of the maximum token limit of PLMs. Weight modification of the primary diagnosis code is another contribution of ours. The weight modification strategy is based on the primary medical code. There is no iii previous work focused on the primary diagnosis code for medical code prediction tasks. The primary medical code is the reason for admission to the hospital, which is also the first medical code presented in an EHR. The primary diagnosis code is also the most important ICD code for reimbursement systems. We use the highest probability predicted by our model to calculate the accuracy of the main diagnosis code. As a result, we lower the training target weight except for the primary diagnosis code. We proved the effectiveness of this method with various weights’ experiments on the Landseed dataset. The original performance on PLM-ICD is 0.52123 on F1-micro, and 0.2736 on the primary diagnosis code score. The training weight-modified method is designed to increase performance on primary diagnosis. The model performed 0.2116 on F1-micro, and 0.4493 on the primary diagnosis code score. We can see the improvement in primary diagnosis code from 0.2736 to 0.4493. We can further aggregate the results from different weight configurations and achieve a better F1-Micro of 0.5260. As part of our work, we evaluated the effectiveness of the columns used in the discharge summaries. Previous researchers on MIMIC-III only utilize the single column provided in MIMIC-III. In the Landseed dataset, there are the following seven important columns we choose based on the physician’s knowledge: “主訴、病史、特殊檢查、醫療影像檢查、病理報告、手術日期及方法、住院治療經過。” We investigate which columns should be used as training input to performs the best. We treat the seven columns as features and select them one by one with the sequential forward feature selection method. Because we have proven that the order in which columns are used affects performance, we chose seven columns as the test result. The best result is using the order of “病史+手術日期及方法+住院治療經過+病理報告+主訴+醫療影像檢查+ 特殊檢查。” The F1-micro score for this is 0.5409 and the primary diagnosis code score is 0.2751. Finally, we can combine weight modification and the best training input order. we achieved a F1-micro of 0.5415 and the primary diagnosis code score of 0.4569. Even though this result can only be verified in the Landseed dataset, it can encourage researchers to follow up with a deeper investigation of ICD coding in the MIMIC-III dataset.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	10	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....