基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.138.114.195

姓名

李仲剛(Jong-Kang Lee) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
(ICD-10 Medical Coding for Discharge Summaries Experiment with Training Target Modification Based on Primary Diagnosis)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究
★ 基於 PowerDesigner 規範需求分析產出之快速導入方法	★ 社群論壇之問題檢索
★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例	★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現
★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法	★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結
★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data	★ 藉由加入多重語音辨識結果來改善對話狀態追蹤
★ 對話系統應用於中文線上客服助理:以電信領域為例	★ 應用遞歸神經網路於適當的時機回答問題
★ 使用多任務學習改善使用者意圖分類	★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法
★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家	★ 使用YMCL模型改善使用者意圖分類成效

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

疾病分類任務是將電子病歷分類至疾病分類編碼的過程。現今的疾病分類任務
通常由醫生完成並由疾病分類師檢查。他們需要詳閱住院紀錄並準確標註出疾病種
類。這個過程需要耗時費力。因此也有許多研究致力於自動疾病分類系統。
自動疾病分類系統面臨著兩個難處：第一個是長文本。臨床病歷大多不是結構
化資料，通常由醫師或看護者撰寫的自由文本。病歷的文本長度經常超過預訓練語
言最大輸入限制，導致無法直接使用預訓練語言模型。第二個問題是嚴重的分類不
平衡問題。
我們與聯新醫院合作，使用醫院的出院摘要記錄作為我們的資料集。我們主要
進行了兩種模型實驗，第一種是一般的預訓練語言模型，第二種是使用 PLM-ICD
模型與其改進方法。PLM-ICD 是目前在 MIMIC-III 表現最好的疾病分類系統。
我們第一個貢獻是使用到主診斷的資訊，並於訓練時修改其他診斷地訓練權
重。主診斷代碼是導致入院的主要原因，也是電子病歷中出現的第一個醫療代碼，
這是其他疾病分類系統沒有注意到的一件事。我們修改主診斷以外診斷的訓練權
重，並實驗證實了這樣能夠提升主診斷的準確度：原始 PLM-ICD 用的方法主診斷
的準確率是 0.2736，修改權重後能夠提升為 0.4493。另外 F1-Micro 的分數是
0.5212。
我們也評估了如何使用的不同欄位資料以達到最佳的效果。聯新醫院提供的資
料集被分成很多個欄位。這是 MIMIC-III 資料集沒有的特性。我們根據醫生的專業
知識選擇了以下七個欄位進行實驗：主訴、病史、特殊檢查、醫療影像檢查、病理
報告、手術日期及方法、住院治療經過。我們實驗哪些欄位應該被用作訓練輸入文
本以發揮最佳效果。我們將這七個欄位視為特徵，並使用向前特徵選擇法一一選
擇。因為我們已經證明列的使用順序會影響性能，所以我們選擇了七列作為測試結
果。最後的結果是按照”病史+手術日期及方法+住院治療經過+病理報告+主訴+醫
療影像檢查+特診檢查的”順序能夠有最好的表現。這樣的順序能夠將 F1-micro 提
升為 0.5409，主診斷準確率為 0.2751 .
最後，我們結合修改權重和最佳欄位順序兩項實驗，達到 F1-micro 為 0.5415
和主診斷準確率為 0.4569 的初級診斷代碼分數。雖然這些實驗只用於聯新資料集
中得到驗證，但它可以鼓勵研究人員更仔細的處理疾病分類問題 ICD 並探討更深
入的研究。

摘要(英)

Medical coding is the process of assigning medical codes to electronic health
records (EHRs). The most widely used medical code set is the ICD (International
Classification of Disease) code, so medical coding is usually referred to as ICD coding.
ICD coding is a practical and essential need for hospitals around the world, but medical
coding is usually done by physicians and checked by clinical coders (nosologists). They
need to read the whole health record in detail and assign every clinical code to the health
record, the process requires significant human resources and is time-consuming. A
medical coding problem can be viewed as a multi-label text classification problem using
NLP (natural language processing).
Recent automatic coding systems have faced two common challenges. The first is
long free text. The clinical health record is mostly in free-text form, and the length of text
will exceed the modern pre-trained language maximum input limit, which will lose the
opportunity to use the understanding of pre-trained language models (PLMs). The second
problem is the huge label set and label imbalance. There are around 70000 ICD codes,
which vary between countries. The two traits make the model harder to train, and lead to
lower performance.
For this study, we have the good fortune to collaborate with Landseed hospital and
to use the hospital′s discharge summary record as our dataset. We compare the
performances of vanilla multi-label classification with different methods, the first
maximum-length tokens, the first/last maximum-length tokens, and the slice-and-merge
method. Furthermore, we continue to use the state-of-the-art method on the MIMIC-III
dataset, the PLM-ICD system. PLM-ICD slices the clinical records into multiple
sequences, not exceeding the maximum token limit of PLMs, and concatenates the
embeddings, which solves the problem of the maximum token limit of PLMs.
Weight modification of the primary diagnosis code is another contribution of ours.
The weight modification strategy is based on the primary medical code. There is no
iii
previous work focused on the primary diagnosis code for medical code prediction tasks.
The primary medical code is the reason for admission to the hospital, which is also the
first medical code presented in an EHR. The primary diagnosis code is also the most
important ICD code for reimbursement systems. We use the highest probability predicted
by our model to calculate the accuracy of the main diagnosis code. As a result, we lower
the training target weight except for the primary diagnosis code. We proved the
effectiveness of this method with various weights’ experiments on the Landseed dataset.
The original performance on PLM-ICD is 0.52123 on F1-micro, and 0.2736 on the
primary diagnosis code score. The training weight-modified method is designed to
increase performance on primary diagnosis. The model performed 0.2116 on F1-micro,
and 0.4493 on the primary diagnosis code score. We can see the improvement in primary
diagnosis code from 0.2736 to 0.4493. We can further aggregate the results from different
weight configurations and achieve a better F1-Micro of 0.5260.
As part of our work, we evaluated the effectiveness of the columns used in the
discharge summaries. Previous researchers on MIMIC-III only utilize the single column
provided in MIMIC-III. In the Landseed dataset, there are the following seven important
columns we choose based on the physician’s knowledge: “主訴、病史、特殊檢查、醫
療影像檢查、病理報告、手術日期及方法、住院治療經過。” We investigate which
columns should be used as training input to performs the best. We treat the seven
columns as features and select them one by one with the sequential forward feature
selection method. Because we have proven that the order in which columns are used
affects performance, we chose seven columns as the test result. The best result is using
the order of “病史+手術日期及方法+住院治療經過+病理報告+主訴+醫療影像檢查+
特殊檢查。” The F1-micro score for this is 0.5409 and the primary diagnosis code score
is 0.2751.
Finally, we can combine weight modification and the best training input order. we
achieved a F1-micro of 0.5415 and the primary diagnosis code score of 0.4569. Even
though this result can only be verified in the Landseed dataset, it can encourage
researchers to follow up with a deeper investigation of ICD coding in the MIMIC-III
dataset.

關鍵字(中)

★ 疾病分類
★ 電子病歷
★ 自然語言處理
★ 機器學習

關鍵字(英)

★ medical coding
★ electronic health record
★ natural language processing
★ machine learning

論文目次

中文摘要 i
Abstract ii
致謝 iv
Chapter 1 Introduction and Background 1
1.1 The International Classification of Diseases 1
1.2 Automatic Medical Coding System 2
1.3 Medical Code Prediction Dataset 2
1.3.1 MIMIC-III Dataset 2
1.3.2 Landseed Dataset 3
Chapter 2 Method 5
2.1 Vanilla PLM Multi-label Classification 5
2.2 Importance of Primary Diagnosis 6
2.3 PLM-ICD and Training Target Modification 7
2.4 The Column Selection on Landseed Dataset 8
2.4.1 Column Selection with Sequential Forward Feature Selecting Method 9
Chapter 3 Experiment 14
3.1 Evaluation Metrics 15
3.2 Vanilla PLM Multi-label Classification 17
3.3 PLM-ICD and Training Target Modification 18
3.3.1 Data-v1 Experiment 18
3.3.2 Data-v2 Experiment 22
3.3.3 Data-v3 Experiment 23
Chapter 4 Conclusion 33
Chapter 5 Reference 34

參考文獻

1. Cartwright, D.J., ICD-9-CM to ICD-10-CM codes: what? why? how? 2013, Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA.
2. Mullenbach, J., et al., Explainable prediction of medical codes from clinical text. arXiv preprint arXiv:1802.05695, 2018.
3. Vu, T., D.Q. Nguyen, and A. Nguyen, A label attention model for icd coding from clinical text. arXiv preprint arXiv:2007.06351, 2020.
4. Yuan, Z., C. Tan, and S. Huang, Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding. arXiv preprint arXiv:2203.01515, 2022.
5. Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
6. Bodenreider, O., The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 2004. 32(suppl_1): p. D267-D270.
7. Huang, C.-W., S.-C. Tsai, and Y.-N. Chen. PLM-ICD: Automatic ICD Coding with Pretrained Language Models. 2022. Seattle, WA: Association for Computational Linguistics.
8. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
9. Liu, Y., et al., Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
10. Johnson, A.E., et al., MIMIC-III, a freely accessible critical care database. Scientific data, 2016. 3(1): p. 1-9.
11. Gao, S., et al., Limitations of transformers on clinical text classification. IEEE journal of biomedical and health informatics, 2021. 25(9): p. 3596-3607.
12. Gu, Y., et al., Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 2021. 3(1): p. 1-23.
13. Alsentzer, E., et al., Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323, 2019.
14. Lee, J., et al., BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020. 36(4): p. 1234-1240.
15. Tanihara, S., Z. Yamagata, and H. Une, Reliability of health insurance claim statistical data based on the principal diagnosis method. Nihon Eiseigaku zasshi. Japanese Journal of Hygiene, 2008. 63(1): p. 29-35.
16. Tanihara, S., E. Okamoto, and H. Une, A comparison of disease‐specific medical expenditures in Japan using the principal diagnosis method and the proportional distribution method. Journal of Evaluation in Clinical Practice, 2012. 18(3): p. 616-622.
17. Fiori, W., et al., The significance of the principal diagnosis in Germany′s new payment system for inpatient treatment of mental disorders. Zeitschrift fur Psychosomatische Medizin und Psychotherapie, 2014. 60(1): p. 25-38.
18. Connell, F.A., L.A. Blide, and M.A. Hanken, Ambiguities in the selection of the principal diagnosis: impact on data quality, hospital statistics and DRGs. Journal (American Medical Record Association), 1984. 55(2): p. 18-23.
19. MacIntyre, C.R., et al., Accuracy of ICD–9–CM codes in hospital morbidity data, Victoria: implications for public health research. Australian and New Zealand journal of public health, 1997. 21(5): p. 477-482.
20. Farzandipour, M., A. Sheikhtaheri, and F. Sadoughi, Effective factors on accuracy of principal diagnosis coding based on International Classification of Diseases, the 10th revision (ICD-10). International Journal of Information Management, 2010. 30(1): p. 78-84.
21. Sánchez-Maroño, N., A. Alonso-Betanzos, and M. Tombilla-Sanromán, Filter methods for feature selection--a comparative study. Lecture notes in computer science, 2007. 4881: p. 178-187.

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2023-2-1

推文