摘要(英) |
Medical coding is the process of assigning medical codes to electronic health
records (EHRs). The most widely used medical code set is the ICD (International
Classification of Disease) code, so medical coding is usually referred to as ICD coding.
ICD coding is a practical and essential need for hospitals around the world, but medical
coding is usually done by physicians and checked by clinical coders (nosologists). They
need to read the whole health record in detail and assign every clinical code to the health
record, the process requires significant human resources and is time-consuming. A
medical coding problem can be viewed as a multi-label text classification problem using
NLP (natural language processing).
Recent automatic coding systems have faced two common challenges. The first is
long free text. The clinical health record is mostly in free-text form, and the length of text
will exceed the modern pre-trained language maximum input limit, which will lose the
opportunity to use the understanding of pre-trained language models (PLMs). The second
problem is the huge label set and label imbalance. There are around 70000 ICD codes,
which vary between countries. The two traits make the model harder to train, and lead to
lower performance.
For this study, we have the good fortune to collaborate with Landseed hospital and
to use the hospital′s discharge summary record as our dataset. We compare the
performances of vanilla multi-label classification with different methods, the first
maximum-length tokens, the first/last maximum-length tokens, and the slice-and-merge
method. Furthermore, we continue to use the state-of-the-art method on the MIMIC-III
dataset, the PLM-ICD system. PLM-ICD slices the clinical records into multiple
sequences, not exceeding the maximum token limit of PLMs, and concatenates the
embeddings, which solves the problem of the maximum token limit of PLMs.
Weight modification of the primary diagnosis code is another contribution of ours.
The weight modification strategy is based on the primary medical code. There is no
iii
previous work focused on the primary diagnosis code for medical code prediction tasks.
The primary medical code is the reason for admission to the hospital, which is also the
first medical code presented in an EHR. The primary diagnosis code is also the most
important ICD code for reimbursement systems. We use the highest probability predicted
by our model to calculate the accuracy of the main diagnosis code. As a result, we lower
the training target weight except for the primary diagnosis code. We proved the
effectiveness of this method with various weights’ experiments on the Landseed dataset.
The original performance on PLM-ICD is 0.52123 on F1-micro, and 0.2736 on the
primary diagnosis code score. The training weight-modified method is designed to
increase performance on primary diagnosis. The model performed 0.2116 on F1-micro,
and 0.4493 on the primary diagnosis code score. We can see the improvement in primary
diagnosis code from 0.2736 to 0.4493. We can further aggregate the results from different
weight configurations and achieve a better F1-Micro of 0.5260.
As part of our work, we evaluated the effectiveness of the columns used in the
discharge summaries. Previous researchers on MIMIC-III only utilize the single column
provided in MIMIC-III. In the Landseed dataset, there are the following seven important
columns we choose based on the physician’s knowledge: “主訴、病史、特殊檢查、醫
療影像檢查、病理報告、手術日期及方法、住院治療經過。” We investigate which
columns should be used as training input to performs the best. We treat the seven
columns as features and select them one by one with the sequential forward feature
selection method. Because we have proven that the order in which columns are used
affects performance, we chose seven columns as the test result. The best result is using
the order of “病史+手術日期及方法+住院治療經過+病理報告+主訴+醫療影像檢查+
特殊檢查。” The F1-micro score for this is 0.5409 and the primary diagnosis code score
is 0.2751.
Finally, we can combine weight modification and the best training input order. we
achieved a F1-micro of 0.5415 and the primary diagnosis code score of 0.4569. Even
though this result can only be verified in the Landseed dataset, it can encourage
researchers to follow up with a deeper investigation of ICD coding in the MIMIC-III
dataset.
|
參考文獻 |
1. Cartwright, D.J., ICD-9-CM to ICD-10-CM codes: what? why? how? 2013, Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA.
2. Mullenbach, J., et al., Explainable prediction of medical codes from clinical text. arXiv preprint arXiv:1802.05695, 2018.
3. Vu, T., D.Q. Nguyen, and A. Nguyen, A label attention model for icd coding from clinical text. arXiv preprint arXiv:2007.06351, 2020.
4. Yuan, Z., C. Tan, and S. Huang, Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding. arXiv preprint arXiv:2203.01515, 2022.
5. Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
6. Bodenreider, O., The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 2004. 32(suppl_1): p. D267-D270.
7. Huang, C.-W., S.-C. Tsai, and Y.-N. Chen. PLM-ICD: Automatic ICD Coding with Pretrained Language Models. 2022. Seattle, WA: Association for Computational Linguistics.
8. Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
9. Liu, Y., et al., Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
10. Johnson, A.E., et al., MIMIC-III, a freely accessible critical care database. Scientific data, 2016. 3(1): p. 1-9.
11. Gao, S., et al., Limitations of transformers on clinical text classification. IEEE journal of biomedical and health informatics, 2021. 25(9): p. 3596-3607.
12. Gu, Y., et al., Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 2021. 3(1): p. 1-23.
13. Alsentzer, E., et al., Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323, 2019.
14. Lee, J., et al., BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020. 36(4): p. 1234-1240.
15. Tanihara, S., Z. Yamagata, and H. Une, Reliability of health insurance claim statistical data based on the principal diagnosis method. Nihon Eiseigaku zasshi. Japanese Journal of Hygiene, 2008. 63(1): p. 29-35.
16. Tanihara, S., E. Okamoto, and H. Une, A comparison of disease‐specific medical expenditures in Japan using the principal diagnosis method and the proportional distribution method. Journal of Evaluation in Clinical Practice, 2012. 18(3): p. 616-622.
17. Fiori, W., et al., The significance of the principal diagnosis in Germany′s new payment system for inpatient treatment of mental disorders. Zeitschrift fur Psychosomatische Medizin und Psychotherapie, 2014. 60(1): p. 25-38.
18. Connell, F.A., L.A. Blide, and M.A. Hanken, Ambiguities in the selection of the principal diagnosis: impact on data quality, hospital statistics and DRGs. Journal (American Medical Record Association), 1984. 55(2): p. 18-23.
19. MacIntyre, C.R., et al., Accuracy of ICD–9–CM codes in hospital morbidity data, Victoria: implications for public health research. Australian and New Zealand journal of public health, 1997. 21(5): p. 477-482.
20. Farzandipour, M., A. Sheikhtaheri, and F. Sadoughi, Effective factors on accuracy of principal diagnosis coding based on International Classification of Diseases, the 10th revision (ICD-10). International Journal of Information Management, 2010. 30(1): p. 78-84.
21. Sánchez-Maroño, N., A. Alonso-Betanzos, and M. Tombilla-Sanromán, Filter methods for feature selection--a comparative study. Lecture notes in computer science, 2007. 4881: p. 178-187.
|