dc.description.abstract | Information retrieval-based medical question-answering systems usually return relevant answers to a user’s question in a ranked list. However, retrieved results may contain complex and diverse information that hinders users from meeting their specific question intents easily. Therefore, this study focuses on developing extractive summarization techniques for Chinese medical answers. We propose a model called HGATSUM (Heterogeneous Graph Attention Networks for Summarization). First, we construct a heterogeneous graph comprised of nodes in terms of questions, answer sentences, and medical entities and their relationships as edges, including 1) dependency relationships based on Rhetorical Structure Theory (RST) among answer sentences; 2) similarity relationships between questions and answer sentences; and 3) mention relationships between entities and question/answer sentences. Then, Graph Attention Networks are used to learn feature representations of heterogeneous graph nodes. Finally, we combine the graph features of answer sentences with relevancy to the posed question for selecting and assembling partial sentences as an extracted summary.
Due to a lack of publicly released benchmark data for medical answer summarization, we constructed a dataset called Med-AnsSum for the extractive summarization task of Chinese medical answers. This dataset contains 3,314 question-answer pairs across 469 distinct medical questions returned by the medical question-answering system, each was manually annotated to obtain an extractive answer summary. Based on experiments and performance evaluations, our proposed HGATSUM model outperforms previous models (i.e., BERTSUMEXT, MATCHSUM, AREDSUM, and Bert-QSBUM) on the Med-AnsSum dataset, achieving the best ROUGE-(1/2/L) scores of 82.08/78.66/81.60. The human evaluation also confirmed that our model is an effective method for Chinese medical answer summarization. | en_US |