運用異質圖注意力網路於中文醫療答案擷取式摘要;Heterogeneous Graph Attention Networks for Extractive Summarization of Chinese Medical Answers

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/93565

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93565

题名:	運用異質圖注意力網路於中文醫療答案擷取式摘要;Heterogeneous Graph Attention Networks for Extractive Summarization of Chinese Medical Answers
作者:	田高源;Tien, Kao-Yuan
贡献者:	電機工程學系
关键词:	擷取式摘要;異質圖;圖注意力網路;extractive summarization;heterogeneous graph;graph attention networks
日期:	2023-10-13
上传时间:	2024-03-05 17:51:43 (UTC+8)
出版者:	國立中央大學
摘要:	檢索式醫療問答系統藉由問題與答案的配對排序，回覆使用者的醫療相關問題。然而，返回的相關資訊通常多樣複雜，對於尋找特定資訊的使用者來說，這些答案通常需要花費時間閱讀與理解。本研究專注於中文醫療答案摘要問題，藉由文本摘要技術，將冗長複雜的相關資訊，擷取成簡潔易於理解的答案。我們提出一個基於異質圖注意力網路的擷取式摘要模型 (Heterogeneous Graph Attention Networks for Extractive Summarization, HGATSUM)，用於檢索式中文醫療問答系統。首先，我們將醫療問題和答案對建構成異質圖，圖節點包含問題、答案以及醫療實體，節點間關係做為邊，包含1) 答案句子間基於修辭結構理論的依賴關係; 2) 問題與答案句子間的相似關係; 以及3) 醫療實體和問題或答案句子間的提及關係。然後，經由圖注意力網路來學習異質圖的節點表示。最後，將答案句子的圖節點表示與相關性特徵結合後，進行答案中的句子選擇與組合，形成最終輸出摘要答案。由於缺乏公開的評測資料集，我們建置了一個中文醫療答案擷取式摘要任務的資料集 (Med-AnsSum)，包含469筆醫療問題，以及這些問題藉由檢索系統返回的問答配對共有3,314筆，每筆皆人工標記擷取摘要答案。藉由實驗與效能評估得知，我們提出的模型HGATSUM在資料集Med-AnsSum上的ROUGE (1/2/L) 分數表現 (82.08/78.66/81.60)，皆優於其他相關模型(BERTSUMEXT, MATCHSUM, AREDSUM以及Bert-QSBUM)，人工評估進一步驗證我們提出的HGATSUM模型在中文醫療答案擷取式摘要上有良好的表現。 ;Information retrieval-based medical question-answering systems usually return relevant answers to a user’s question in a ranked list. However, retrieved results may contain complex and diverse information that hinders users from meeting their specific question intents easily. Therefore, this study focuses on developing extractive summarization techniques for Chinese medical answers. We propose a model called HGATSUM (Heterogeneous Graph Attention Networks for Summarization). First, we construct a heterogeneous graph comprised of nodes in terms of questions, answer sentences, and medical entities and their relationships as edges, including 1) dependency relationships based on Rhetorical Structure Theory (RST) among answer sentences; 2) similarity relationships between questions and answer sentences; and 3) mention relationships between entities and question/answer sentences. Then, Graph Attention Networks are used to learn feature representations of heterogeneous graph nodes. Finally, we combine the graph features of answer sentences with relevancy to the posed question for selecting and assembling partial sentences as an extracted summary. Due to a lack of publicly released benchmark data for medical answer summarization, we constructed a dataset called Med-AnsSum for the extractive summarization task of Chinese medical answers. This dataset contains 3,314 question-answer pairs across 469 distinct medical questions returned by the medical question-answering system, each was manually annotated to obtain an extractive answer summary. Based on experiments and performance evaluations, our proposed HGATSUM model outperforms previous models (i.e., BERTSUMEXT, MATCHSUM, AREDSUM, and Bert-QSBUM) on the Med-AnsSum dataset, achieving the best ROUGE-(1/2/L) scores of 82.08/78.66/81.60. The human evaluation also confirmed that our model is an effective method for Chinese medical answer summarization.
显示于类别:	[電機工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	48	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....