摘要: | 中文摘要 審閱健保提供者的醫療診斷記錄可以進行細緻的分析,為小型診所和大型醫療機構的物流管理、品質控制和成本效益產生有價值的見解。 然而,大多數醫生仍然使用紙本病歷,而現有的電子病歷(如放射學報告)主要以非結構化格式編寫。 這是一個耗時的過程,因為具有醫學領域知識的醫療人員必須閱讀整個文檔才能完全理解非結構化、自由格式的文本,無論是手寫的還是數位化的格式。 對於非醫學相關人員來說,這是一項不可能完成的任務,因為現在的醫學報告傾向於詳細記錄當下情況,並附有許多技術註釋和速記術語。 除此之外,含糊不清的術語如: 排除、疑似診斷或鑑別診斷常用於表示器官或疾病的狀況。 但手術結果數據通常報告為原始發病率和死亡率,這不一定能有效且直觀地反映出手術治療的有效性與否。 若是有一種能夠區分正常病例和異常病例並識別支持結論的關鍵句子的工具,可能會減少查看病歷的時間成本。 此外,它還可以幫助沒有足夠醫學知識的普通檢閱人員更精準、正確地理解病例報告。 NLP是自然語言處理的縮小,它是一種能用於自動文檔分類的新技術。它能將非結構化文檔轉換為結構化格式以利於後續信息提取的數值分析的工作。因此,我們的目標有兩個:1)使用 NLP 挖掘診斷報告中的消息,以區分報告案例是正常還是異常 2)找出哪些句子最為關鍵,可能導致出結果。現在已經有很多最先進的方法來實踐 NLP,例如樸素貝葉斯、長短期記憶 (LSTM)、BERT…等。 通過使用 BERT 分類器模型,在醫學主題詞 (MeSH) 數據集(n = 3,955個帶標記的報告)上,識別病例是否異常的分類任務,我們的校驗分數及用於校驗的數據數(n = 792 手動標記,正常病例的 f1-score = 0.97,異常案例的f1-score = 0.94)和測試分數及用於測驗的數據數(n = 395 手動標記,正常案例的 f1-score = 0.98,異常案例的 f1-score = 0.96)。而且在使用了sentence-BERT之後,我們可以得到每個句子和整個報告之間的相似度。這有助於找出哪個句子可能屬於正常或異常。;Abstract Reviewing the medical diagnosis records of a health care provider allows the meticulous analysis for generating invaluable insights into logistics management, quality control, and cost effectiveness for both small clinics and large health organizations. However, the majority of physicians still use paper medical records and the existing electronic medical records such as radiology reports are scripted mainly in an unstructured format. It is a time-consuming process as the medical examiner with domain knowledge has to read entire document in order to be able to fully understand the unstructured, free-form texts, which are in no matter hand-written or digital formats. It is an impossible task to non-medical examiners since the medical reports nowadays tend to record the situation in detail with many technical notes and terms in shorthand. In addition, the ambiguous terms “rule out”, “suspected diagnosis”, or “differential diagnosis” are more commonly used to denote the conditions of the organs or the diseases. But surgical outcome data are generally reported as raw morbidity and mortality, which do not necessarily reflect the level of effectiveness of surgical treatment. A tool, which can distinguish normal from abnormal cases and identify the key sentences supporting the conclusion, can possibly reduce the time cost in reviewing medical records. Furthermore, it can also help common examiners who have not enough medical knowledge to understand the reports more precisely and correctly. NLP, which stands for natural language processing, is a novel technology for automatic document classification. It is the work horse to transform unstructured documents into a structured format to enable numerical analysis for information extraction. Thus, our goals are two folds: 1) to use NLP for mining the messages from diagnosis reports to distinguish the report cases are normal or abnormal 2) to find which sentences may lead to the consequences. There have been plenty of state-of-art methods to practice NLP such as Na?ve-Bayes, Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformer (BERT) etc. By using BERT classifier model, the identifying the case is abnormal or not task on Medical Subject Headings (MeSH) dataset (n = 3,955 annotated reports) we got validate (n = 792 labeled manually, f1-score of Normal cases = 0.97, f1-score of Abnormal cases = 0.94) and test score (n = 395 labeled manually, f1-score of Normal case= 0.98, f1-score of Abnormal cases = 0.96). And with sentence-BERT, we can get the similarity between each sentence and whole reports. It would help finding which sentence may belong to normal or abnormal. |