學習使用者意圖於中文醫療問題生成式摘要

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	鄭元皓	zh_TW
DC.creator	Yuan-Hao Cheng	en_US
dc.date.accessioned	2023-10-13T07:39:07Z
dc.date.available	2023-10-13T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110521079
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	生成式摘要任務的目標是將一篇長文本精簡地濃縮成語意相同，且保留主要資訊的重點摘要，可以應用於眾多情境，例如:製作新聞標題、學術論文摘要、自動化報告生成與問答聊天機器人等。本研究的主要目標檢索式醫療問答系統的問題理解，使用者醫療問題多數存在過多不必要的信息，導致檢索系統的問答匹配精準性下降，因此，我們開發生成式摘要技術做為問題理解的解決方案，產生對應使用者醫療問題的摘要問句，，用以輸入檢索式醫療問答系統，改善撈取相關答案的匹配性。我們提出一個基於意圖的醫療問題摘要 (Intent-based Medical Question Summarization, IMQS) 模型，包含實體辨識器擷取原問句的醫療實體，然後使用實體提示方式加入原始問句，做為摘要模型的輸入句，共同學習問題意圖分類與摘要任務，微調於摘要語言模型的編碼器與解碼器，藉以生成對實體有更加關注且與保留原問句意圖的摘要。我們透過網路爬蟲蒐集醫聯網醫師諮詢平台的民眾提問，篩選合適的問題進行醫療實體標記、問題意圖標記、以及問題摘要標記，最終建置一組醫療問題摘要資料集 Med- QueSumm，包含 2,468 個中文醫療問題，原問句平均約 110 個字元及 7.75 個實體，以及 6 個定義好的意圖種類(病症、藥物、科室、治療、檢查、資訊)其中之一，摘要問句平均約 45 個字元，長度約為原問句的 40%。藉由實驗結果與 IMQS 模型分析得知，我們提出的模型在摘要任務達到最好的 ROUGE-1 69.59% 、ROUGE-2 51.32% 、ROUGE-L 61.69% 與 BERTScore 64.08% ，比相關研究模型 (BERTSum-abs, PEGASUS, ProphetNet, CPT, BART, GSum, SpanCopy)等有更好的摘要效能，且在意圖分類上也達到 Micro-F1 85.54%。整題而言，IMQS 模型為兼具摘要品質與意圖分析的中文醫療問題摘要方法。	zh_TW
dc.description.abstract	The goal of the generative summarization task is to condense a long text into a shorter summary while retaining the main information and key contents. The main objective of this research is to understand medical problems through summarization techniques. In retrieval- based question-answering systems, users’ medical questions may contain unnecessary information that hinders the retrieval performance. Therefore, we focus on developing a generative summarization model called IMQS (Intent-based Medical Question Summarization) to create corresponding question summaries. First, we use an entity recognizer to extract the medical entities of an original question and design an entity prompt to formulate the input question to our summarization model. Then, joint learning of question intents and summaries to fine-tune the encoder and decoder in the language model. Finally, we can obtain more attention to medical entities and retain the intent of an original question in the generated summary. We collected users’ questions from a physician consultation platform: MedNet and selected suitable ones for entity tagging, intent labeling, and question summarization, resulting in a dataset called Med-QueSumm. We have a total of 2,468 Chinese medical questions, each with an average of about 110 characters and 7.75 entities, while the summarized questions are around 45 characters, accounting for near 40% of original questions. In addition, each question is annotated to one of six intent categories: symptoms, drugs, departments, treatments, examinations, and information. Experimental results and model analysis show that our IMQS model achieves the best ROUGE-1/-2/-L of 69.59/51.32/61.69 and a BERTScore of 64.08 in the summarization task, outperforming other related models including BERTSum-abs, PEGASUS, ProphetNet, CPT, BART, GSum, and SpanCopy. Besides, our IMQS model obtained the best micro-F1 score of 85.54 in intent classification. Overall, it’s an effective summarization method for Chinese medical questions.	en_US
DC.subject	生成式摘要	zh_TW
DC.subject	序列到序列	zh_TW
DC.subject	預訓練語言模型	zh_TW
DC.subject	abstractive summarization	en_US
DC.subject	sequence to sequence	en_US
DC.subject	pre-trained language model	en_US
DC.title	學習使用者意圖於中文醫療問題生成式摘要	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Learning User Intents for Abstractive Summarization of Chinese Medical Questions	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110521079 完整後設資料紀錄