基於序列到序列之問題-答案對生成模型;Generating Question-Answer Pairs from Document with Sequence to Sequence Modeling

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/77583

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/77583

题名:	基於序列到序列之問題-答案對生成模型;Generating Question-Answer Pairs from Document with Sequence to Sequence Modeling
作者:	湯珮茹;Tang, Pei-Ju
贡献者:	資訊工程學系
关键词:	閱讀理解;序列到序列;問題-答案對;動態字典;Reading Comprehension;Sequence to Sequence;Question-Answer Pair;Dynamic Vocabulary
日期:	2018-07-20
上传时间:	2018-08-31 14:49:02 (UTC+8)
出版者:	國立中央大學
摘要:	近年來有許多回答問題及產生問題的研究應用於閱讀理解中，並獲得相當不錯的成效。回答問題研究的目標是通過閱讀文章來回答問題，而產生問題研究希望根據文章產生多樣性問題且可作為閱讀測驗的考題。然而，產生問題有時會產生超出文章本身的範圍，以致於機器無法產生解答。本論文為了解決上述問題-答案不成對的問題，提出一解決方案。用戶只要輸入文章及問題類型，系統即可輸出具有品質的問題-答案對，其產生的問題和答案必須關注在相同的輸入文章句子上，並具備流暢度、相關性及正確性。在本論文中，我們採用基於注意力機制的序列到序列模型，並於訊息編碼中加入階層式輸入。為了解決從龐大字典中產生問題及答案而造成無法收斂，所以輸出問題及答案採用動態字典，除包含常用單詞之外，還有隨文章不同而動態改變的單詞，讓訓練能夠收斂並大幅提升模型能力。所訓練的模型除了能產生問題-答案對之外，也能作為回答問題及產生問題，且效能優於檢索式模型。;In recent years, many studies in question answering and question generation have been applied in reading comprehension with satisfactory performance. Question answering aims to answer questions by reading documents, and question generation attempts to generate diverse questions from given documents and as reading test questions. However, some questions generated in question generation studies are beyond the scope of the given documents themselves, so that the machine cannot find an appropriate answer. In this thesis, we propose an approach to resolve the above question-answer unpaired situations. Given a document and the question type, the system can output a question-answer pair with quality. The question and answer generated must focus on the input document with fluency, relevance, and correctness. In this thesis, we use the attention-based sequence-to-sequence model and add hierarchical input to the model encoder. In order to solve the problem that the model generating question and answer from huge vocabulary cannot converge, the output question and answer adopts a dynamic vocabulary, which includes not only commonly used words, but also words dynamically changing with the document. Such approach makes the training to converge and improves model capabilities. The trained model can generate question-answer pairs as well as do question answering and question generation. The resulting performance is better than the retrieval-based model.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	95	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....