English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78852/78852 (100%)
造訪人次 : 38467669      線上人數 : 2921
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93566


    題名: 學習使用者意圖於中文醫療問題生成式摘要;Learning User Intents for Abstractive Summarization of Chinese Medical Questions
    作者: 鄭元皓;Cheng, Yuan-Hao
    貢獻者: 電機工程學系
    關鍵詞: 生成式摘要;序列到序列;預訓練語言模型;abstractive summarization;sequence to sequence;pre-trained language model
    日期: 2023-10-13
    上傳時間: 2024-03-05 17:51:45 (UTC+8)
    出版者: 國立中央大學
    摘要: 生成式摘要任務的目標是將一篇長文本精簡地濃縮成語意相同,且保留主要資訊的 重點摘要,可以應用於眾多情境,例如:製作新聞標題、學術論文摘要、自動化報告生 成與問答聊天機器人等。本研究的主要目標檢索式醫療問答系統的問題理解,使用者醫 療問題多數存在過多不必要的信息,導致檢索系統的問答匹配精準性下降,因此,我們 開發生成式摘要技術做為問題理解的解決方案,產生對應使用者醫療問題的摘要問句,, 用以輸入檢索式醫療問答系統,改善撈取相關答案的匹配性。我們提出一個基於意圖的 醫療問題摘要 (Intent-based Medical Question Summarization, IMQS) 模型,包含實體辨 識器擷取原問句的醫療實體,然後使用實體提示方式加入原始問句,做為摘要模型的輸 入句,共同學習問題意圖分類與摘要任務,微調於摘要語言模型的編碼器與解碼器,藉 以生成對實體有更加關注且與保留原問句意圖的摘要。
    我們透過網路爬蟲蒐集醫聯網醫師諮詢平台的民眾提問,篩選合適的問題進行醫療 實體標記、問題意圖標記、以及問題摘要標記,最終建置一組醫療問題摘要資料集 Med- QueSumm,包含 2,468 個中文醫療問題,原問句平均約 110 個字元及 7.75 個實體,以及 6 個定義好的意圖種類(病症、藥物、科室、治療、檢查、資訊)其中之一,摘要問句平均 約 45 個字元,長度約為原問句的 40%。藉由實驗結果與 IMQS 模型分析得知,我們提 出的模型在摘要任務達到最好的 ROUGE-1 69.59% 、ROUGE-2 51.32% 、ROUGE-L 61.69% 與 BERTScore 64.08% , 比 相 關 研 究 模 型 (BERTSum-abs, PEGASUS, ProphetNet, CPT, BART, GSum, SpanCopy)等有更好的摘要效能,且在意圖分類上也達到 Micro-F1 85.54%。整題而言,IMQS 模型為兼具摘要品質與意圖分析的中文醫療問題摘 要方法。;The goal of the generative summarization task is to condense a long text into a shorter summary while retaining the main information and key contents. The main objective of this research is to understand medical problems through summarization techniques. In retrieval- based question-answering systems, users’ medical questions may contain unnecessary information that hinders the retrieval performance. Therefore, we focus on developing a generative summarization model called IMQS (Intent-based Medical Question Summarization) to create corresponding question summaries. First, we use an entity recognizer to extract the medical entities of an original question and design an entity prompt to formulate the input question to our summarization model. Then, joint learning of question intents and summaries to fine-tune the encoder and decoder in the language model. Finally, we can obtain more attention to medical entities and retain the intent of an original question in the generated summary.
    We collected users’ questions from a physician consultation platform: MedNet and selected suitable ones for entity tagging, intent labeling, and question summarization, resulting in a dataset called Med-QueSumm. We have a total of 2,468 Chinese medical questions, each with an average of about 110 characters and 7.75 entities, while the summarized questions are around 45 characters, accounting for near 40% of original questions. In addition, each question is annotated to one of six intent categories: symptoms, drugs, departments, treatments, examinations, and information. Experimental results and model analysis show that our IMQS model achieves the best ROUGE-1/-2/-L of 69.59/51.32/61.69 and a BERTScore of 64.08 in the summarization task, outperforming other related models including BERTSum-abs, PEGASUS, ProphetNet, CPT, BART, GSum, and SpanCopy. Besides, our IMQS model obtained the best micro-F1 score of 85.54 in intent classification. Overall, it’s an effective summarization method for Chinese medical questions.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML53檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明