dc.description.abstract | Medical coding is the process of assigning medical codes to electronic health
records (EHRs). The most widely used medical code set is the ICD (International
Classification of Disease) code, so medical coding is usually referred to as ICD coding.
ICD coding is a practical and essential need for hospitals around the world, but medical
coding is usually done by physicians and checked by clinical coders (nosologists). They
need to read the whole health record in detail and assign every clinical code to the health
record, the process requires significant human resources and is time-consuming. A
medical coding problem can be viewed as a multi-label text classification problem using
NLP (natural language processing).
Recent automatic coding systems have faced two common challenges. The first is
long free text. The clinical health record is mostly in free-text form, and the length of text
will exceed the modern pre-trained language maximum input limit, which will lose the
opportunity to use the understanding of pre-trained language models (PLMs). The second
problem is the huge label set and label imbalance. There are around 70000 ICD codes,
which vary between countries. The two traits make the model harder to train, and lead to
lower performance.
For this study, we have the good fortune to collaborate with Landseed hospital and
to use the hospital′s discharge summary record as our dataset. We compare the
performances of vanilla multi-label classification with different methods, the first
maximum-length tokens, the first/last maximum-length tokens, and the slice-and-merge
method. Furthermore, we continue to use the state-of-the-art method on the MIMIC-III
dataset, the PLM-ICD system. PLM-ICD slices the clinical records into multiple
sequences, not exceeding the maximum token limit of PLMs, and concatenates the
embeddings, which solves the problem of the maximum token limit of PLMs.
Weight modification of the primary diagnosis code is another contribution of ours.
The weight modification strategy is based on the primary medical code. There is no
iii
previous work focused on the primary diagnosis code for medical code prediction tasks.
The primary medical code is the reason for admission to the hospital, which is also the
first medical code presented in an EHR. The primary diagnosis code is also the most
important ICD code for reimbursement systems. We use the highest probability predicted
by our model to calculate the accuracy of the main diagnosis code. As a result, we lower
the training target weight except for the primary diagnosis code. We proved the
effectiveness of this method with various weights’ experiments on the Landseed dataset.
The original performance on PLM-ICD is 0.52123 on F1-micro, and 0.2736 on the
primary diagnosis code score. The training weight-modified method is designed to
increase performance on primary diagnosis. The model performed 0.2116 on F1-micro,
and 0.4493 on the primary diagnosis code score. We can see the improvement in primary
diagnosis code from 0.2736 to 0.4493. We can further aggregate the results from different
weight configurations and achieve a better F1-Micro of 0.5260.
As part of our work, we evaluated the effectiveness of the columns used in the
discharge summaries. Previous researchers on MIMIC-III only utilize the single column
provided in MIMIC-III. In the Landseed dataset, there are the following seven important
columns we choose based on the physician’s knowledge: “主訴、病史、特殊檢查、醫
療影像檢查、病理報告、手術日期及方法、住院治療經過。” We investigate which
columns should be used as training input to performs the best. We treat the seven
columns as features and select them one by one with the sequential forward feature
selection method. Because we have proven that the order in which columns are used
affects performance, we chose seven columns as the test result. The best result is using
the order of “病史+手術日期及方法+住院治療經過+病理報告+主訴+醫療影像檢查+
特殊檢查。” The F1-micro score for this is 0.5409 and the primary diagnosis code score
is 0.2751.
Finally, we can combine weight modification and the best training input order. we
achieved a F1-micro of 0.5415 and the primary diagnosis code score of 0.4569. Even
though this result can only be verified in the Landseed dataset, it can encourage
researchers to follow up with a deeper investigation of ICD coding in the MIMIC-III
dataset.
| en_US |