電子病歷縮寫消歧與一對多分類任務

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：9

、訪客IP：3.133.151.144

姓名

陳重諺(Chong-Yan Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

電子病歷縮寫消歧與一對多分類任務
(Disambiguate clinical abbreviation by one-to-all classification)

相關論文

★ 使用文字探勘與深度學習技術建置中風後肺炎之預測模型

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-6-15以後開放)

摘要(中)

醫學領域隨著人工智慧發展，越來越多學者相繼提出醫學領域相關的機器學習研究，其中自然語言處理亦是其中最熱門的研究問題。通過各種文字探勘模型的建立可協助醫療輔助診斷、預後追蹤與醫療客服等不同的應用。
然而，這些研究所需的醫療文本資料，往往存在大量的縮寫字，若未能先進行縮寫字詞的詞義消歧將限制後續醫療文本應用之可能性。因此，本研究將聚焦在臨床文本縮寫字還原的問題。
過往研究的解決方式是透過以單詞為基礎之分類器，來將縮寫字還原成縮寫前的狀態，但這樣的方法間接導致後續需要更改、維護、甚至使用上的複雜性增加。本研究會使用多詞彙共用一個分類器作法，納入預訓練的 BERT 進行較為泛化的架構實作與演算法開發，以期提高模型於臨床上的可用性。
本研究所提出之簡化架構可以降低部署的複雜流程，相較傳統方法取得 3%左右正確率提升，使用上的彈性與可維護性更高，解決傳統架構需要重新訓練的問題。

摘要(英)

With the growth of artificial intelligence, more researchers cultivate machine learning topics in the medical field. Natural language preprocessing is the hottest issue, many applications like assistant diagnosis, prognosis tracking, service chatbot......etc are relied on it.
To fulfill the above practices, a cleaning dataset for building a model is necessary; however, there are tons of ambiguous abbreviations in the electronic health record. If researchers don’t disambiguate them to their original senses, it would bring negative effects to performance.
Therefore, this content would discuss how to expand abbreviations in clinical data. In the previous approaches, most scholars built a classifier for every single term. This led to difficulty in deploying models and maintaining them. Thus, in this topic, we utilize pre-train BERT architecture to only build a model for all the terms. Trying to achieve higher usability in the real case.
In conclusion, the accuracy of our method got higher performance for 1 to 3 percentage than previous multi-model ways, but it has the advantage of flexibility and maintainability. Avoid the risk of re-train problems.

關鍵字(中)

★ 縮寫還原
★ 文字探勘
★ 詞義消歧

關鍵字(英)

★ abbreviation expansion
★ text mining
★ word sense disambiguation

論文目次

摘要 i
Abstract ii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 5
第二章文獻探討 6
2.1 詞義消歧 6
2.2 縮寫還原 8
2.3 語境詞嵌入(Contextualized Word Embeddings) 9
第三章研究方法 11
3.1 資料集 12
3.1.1 MSH WSD資料集 12
3.1.2 UMN資料集 13
3.2 資料前處理 13
3.2.1 Context-candidate pair格式產生 13
3.2.2 將文本轉換成BERT可接受的輸入 14
3.3 調適BERT 19
3.4 實驗設計 21
3.4.1 詞義消歧正確性測試 21
3.4.2 臨床縮寫還原表現測試 22
3.4.3 OOV(Out Of Vocabulary)測試 23
3.5 評估指標 23
第四章實驗結果與分析 25
4.1 醫學期刊詞義消歧準確度評估 25
4.2 臨床縮寫字還原準確度評估 26
4.3 OOV實驗 27
第五章研究結論與建議 31
5.1 研究結論 31
5.2 研究限制 32
5.2.1 模型限制 32
5.2.2 修剪算法的限制 32
5.2.3 不同科別的慣用法 33
5.3 未來研究方向與建議 33
參考文獻 35

參考文獻

262588213843476. (n.d.). Word_piece_example.md. Gist. Retrieved April 26, 2022, from https://gist.github.com/jamescalam/7e3f69d6a68d6f3ad7fd8bb58bf87a5f
Bevilacqua, M., Pasini, T., Raganato, A., & Navigli, R. (2021). Recent Trends in Word Sense Disambiguation: A Survey. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 4330–4338. https://doi.org/10.24963/ijcai.2021/593
Bodenreider, O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267–D270. https://doi.org/10.1093/nar/gkh061
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (arXiv:1810.04805). arXiv. http://arxiv.org/abs/1810.04805
Finley, G. P., Pakhomov, S. V. S., McEwan, R., & Melton, G. B. (2017). Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data. AMIA Annual Symposium Proceedings, 2016, 560–569.
Grossman Liu, L., Grossman, R. H., Mitchell, E. G., Weng, C., Natarajan, K., Hripcsak, G., & Vawdrey, D. K. (2021). A deep database of medical abbreviations and acronyms for natural language processing. Scientific Data, 8(1), 149. https://doi.org/10.1038/s41597-021-00929-4
Huang, L., Sun, C., Qiu, X., & Huang, X. (2020). GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge. ArXiv:1908.07245 [Cs]. http://arxiv.org/abs/1908.07245
Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2016). Embeddings for Word Sense Disambiguation: An Evaluation Study. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 897–907. https://doi.org/10.18653/v1/P16-1085
Jimeno Yepes, A. (2017). Word embeddings and recurrent neural networks based on Long- Short Term Memory nodes in supervised biomedical word sense disambiguation. Journal of Biomedical Informatics, 73, 137–147. https://doi.org/10.1016/j.jbi.2017.08.001
Jimeno-Yepes, A. J., & Aronson, A. R. (2010). Knowledge-based biomedical word sense disambiguation: Comparison of approaches. BMC Bioinformatics, 11(1), 569. https://doi.org/10.1186/1471-2105-11-569
Jimeno-Yepes, A. J., McInnes, B. T., & Aronson, A. R. (2011). Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation. BMC Bioinformatics, 12(1), 223. https://doi.org/10.1186/1471-2105-12-223
Jin, Q., Liu, J., & Lu, X. (2019). Deep Contextualized Biomedical Abbreviation Expansion. ArXiv:1906.03360 [Cs, q-Bio]. http://arxiv.org/abs/1906.03360 Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M.,
Moody, B., Szolovits, P., Anthony Celi, L., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1), 160035. https://doi.org/10.1038/sdata.2016.35
Joopudi, V., Dandala, B., & Devarakonda, M. (2018). A convolutional route to abbreviation disambiguation in clinical text. Journal of Biomedical Informatics, 86, 71–78. https://doi.org/10.1016/j.jbi.2018.07.025
Kim, J., Gong, L., Khim, J., Weiss, J. C., & Ravikumar, P. (2020). Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches. Proceedings of the Machine Learning for Health NeurIPS Workshop, 161–178. https://proceedings.mlr.press/v136/kim20a.html
Komeda, Y., Handa, H., Watanabe, T., Nomura, T., Kitahashi, M., Sakurai, T., Okamoto, A., Minami, T., Kono, M., Arizumi, T., Takenaka, M., Hagiwara, S., Matsui, S., Nishida, N., Kashida, H., & Kudo, M. (2017). Computer-Aided Diagnosis Based on Convolutional Neural Network System for Colorectal Polyp Classification: Preliminary Experience. Oncology, 93(1), 30–34. https://doi.org/10.1159/000481227
Li, I., Yasunaga, M., Nuzumlalı, M. Y., Caraballo, C., Mahajan, S., Krumholz, H., & Radev, D. (2019). A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation. ArXiv:1910.14076 [Cs]. http://arxiv.org/abs/1910.14076
Lin, G.-T., & Giambi, M. (2021). Context-gloss Augmentation for Improving Word Sense Disambiguation (arXiv:2110.07174). arXiv. http://arxiv.org/abs/2110.07174
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. ArXiv:1301.3781 [Cs]. http://arxiv.org/abs/1301.3781
Moon, S., Pakhomov, S., Liu, N., Ryan, J. O., & Melton, G. B. (2014). A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. Journal of the American Medical Informatics Association: JAMIA, 21(2), 299–307. https://doi.org/10.1136/amiajnl-2012-001506
Moon, S., Pakhomov, S., & Melton, G. B. (2012). Automated Disambiguation of Acronyms and Abbreviations in Clinical Texts: Window and Training Size Considerations. AMIA Annual Symposium Proceedings, 2012, 1310–1319.
Neumann, M., King, D., Beltagy, I., & Ammar, W. (2019). ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. Proceedings of the 18th BioNLP Workshop and Shared Task, 319–327. https://doi.org/10.18653/v1/W19- 5034
Oleynik, M., Kreuzthaler, M., & Schulz, S. (2017). Unsupervised Abbreviation Expansion in Clinical Narratives. Studies in Health Technology and Informatics, 245, 539–543.
Pal, A. R., & Saha, D. (2015). Word sense disambiguation: A survey. International Journal
of Control Theory and Computer Modeling, 5(3), 1–16.
https://doi.org/10.5121/ijctcm.2015.5301
Park, H. J., Kim, S. M., La Yun, B., Jang, M., Kim, B., Jang, J. Y., Lee, J. Y., & Lee, S. H.
(2019). A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of breast masses on ultrasound: Added value for the inexperienced breast radiologist. Medicine, 98(3), e14146. https://doi.org/10.1097/MD.0000000000014146
Peng, Y., Yan, S., & Lu, Z. (2019). Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. ArXiv:1906.05474 [Cs]. http://arxiv.org/abs/1906.05474
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. https://doi.org/10.3115/v1/D14-1162
Pesaranghader, A., Matwin, S., Sokolova, M., & Pesaranghader, A. (2019). deepBioWSD: Effective deep neural word sense disambiguation of biomedical text data. Journal of the American Medical Informatics Association: JAMIA, 26(5), 438–446. https://doi.org/10.1093/jamia/ocy189
Raganato, A., Delli Bovi, C., & Navigli, R. (2017). Neural Sequence Learning Models for Word Sense Disambiguation. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1156–1167. https://doi.org/10.18653/v1/D17-1120
Rosruen, N., & Samanchuen, T. (2018). Chatbot Utilization for Medical Consultant System. 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-ICON), 1–5. https://doi.org/10.1109/TIMES- iCON.2018.8621678
Sabbir, A., Jimeno-Yepes, A., & Kavuluru, R. (2016). Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings and Distant Supervision.
Sabbir, A., Jimeno-Yepes, A., & Kavuluru, R. (2017). Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings. Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering, 2017, 163–170. https://doi.org/10.1109/BIBE.2017.00-61
Sato, Y., Takegami, Y., Asamoto, T., Ono, Y., Hidetoshi, T., Goto, R., Kitamura, A., & Honda, S. (2020). A Computer-Aided Diagnosis System Using Artificial Intelligence for Hip Fractures -Multi-Institutional Joint Development Research-. ArXiv:2003.12443 [Physics, q-Bio]. http://arxiv.org/abs/2003.12443
Schuster, M., & Nakajima, K. (2012). Japanese and Korean voice search. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
5149–5152. https://doi.org/10.1109/ICASSP.2012.6289079
Skreta, M., Arbabi, A., Wang, J., Drysdale, E., Kelly, J., Singh, D., & Brudno, M. (2021).
Automatically disambiguating medical acronyms with ontology-aware deep learning. Nature Communications, 12, 5319. https://doi.org/10.1038/s41467-021- 25578-4
Tariq, R. A., & Sharma, S. (2021). Inappropriate Medical Abbreviations. In StatPearls. StatPearls Publishing. http://www.ncbi.nlm.nih.gov/books/NBK519006/
Wang, Y., Zheng, K., Xu, H., & Mei, Q. (2017). Clinical Word Sense Disambiguation with Interactive Search and Classification. AMIA Annual Symposium Proceedings, 2016, 2062–2071.
Wu, Y., Denny, J. C., Rosenbloom, S. T., Miller, R. A., Giuse, D. A., Song, M., & Xu, H. (2015). A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time. Applied Clinical Informatics, 6(2), 364–374. https://doi.org/10.4338/ACI-2014-10- RA-0088
Wu, Y., Denny, J. C., Trent Rosenbloom, S., Miller, R. A., Giuse, D. A., Wang, L., Blanquicett, C., Soysal, E., Xu, J., & Xu, H. (2017). A long journey to short abbreviations: Developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). Journal of the American Medical Informatics Association : JAMIA, 24(e1), e79–e86. https://doi.org/10.1093/jamia/ocw109
Wu, Y., Tang, B., Jiang, M., Moon, S., Denny, J. C., & Xu, H. (2013). Clinical Acronym/Abbreviation Normalization using a Hybrid Approach. Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013. http://ceur- ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-WuEt2013.pdf
Wu, Y., Xu, J., Zhang, Y., & Xu, H. (2015). Clinical Abbreviation Disambiguation Using Neural Word Embeddings. Proceedings of BioNLP 15, 171–176. https://doi.org/10.18653/v1/W15-3822
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2020). XLNet: Generalized Autoregressive Pretraining for Language Understanding. ArXiv:1906.08237 [Cs]. http://arxiv.org/abs/1906.08237
Yap, B. P., Koh, A., & Chng, E. S. (2020). Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences (arXiv:2009.11795). arXiv. http://arxiv.org/abs/2009.11795
Zhang, Y., Chen, Q., Yang, Z., Lin, H., & Lu, Z. (2019). BioWordVec, improving biomedical word embeddings with subword information and MeSH. Scientific Data, 6, 52. https://doi.org/10.1038/s41597-019-0055-0

指導教授

胡雅涵曾筱珽(Ya-Han Hu Hsiao-Ting Tseng)

審核日期

2022-7-13

推文