應用自動資訊擷取於故事書問答之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：25

、訪客IP：18.116.51.45

姓名

高愷言(Kai-Yen Kao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

應用自動資訊擷取於故事書問答之研究
(How to Ask and Answer a Robot About a Story Book)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

對於教學者來說，如何從故事文本中，產生高品質且通順的問題-答案配對是一件耗時且耗力的事情，其目的不是要讓學生回答不出來，而是需要經過巧妙的設計將文本中的重要資訊當成答案，並且生成與之相對應的問題。本論文透過預訓練模型進行生成式的問題-答案配對產生，接著擷取文本中的資訊，進行模板式生成問題-答案配對。對話問答也是透過預訓練模型，並對目標領域進行fine tuned，來生成適當的回應。

本論文的方法主要分為兩個部分，第一部分是生成式問題-答案產生，使用answer-aware的方法，先從文本中parse出名詞短語以及動詞相關語句，並且在輸入的部分加上答案的類別對於BART模型進行fine-tuned，最後透過DistilBERT模型進行問題-答案配對排序，使用答案類別可以讓模型生成的問題-答案配對的品質更好，數量也會增加。最後我們也分析了問題-答案生成人工效能評估，並利用問題回應使用的評估方法ROUGE-L做為評估問題-答案配對的指標，發現其相關性比排序分數還要高，可做為問題-答案配對篩選方式。第二部分是模板式生成：使用pipeline的方法，先將實體擷取出來，組成兩兩配對後輸入ALBERT based模型進行關係擷取，並且在輸入語句時使用上下文的資訊，最後將擷取出來的關係作為模板式生成的要素。

摘要(英)

For educators, how to generate high-quality and readability question-answer pairs from the story text is a time-consuming and labor-intensive task. The purpose is not to make students unable to answer, but is to use the important information in the story text as the answer and generate corresponding questions. In this paper, we use the pre-trained model to generate generative-based question-answer pairs. And extracts information from the text, performs a template-based question-answer pairs generation.
Question Answering also generates appropriate responses by pre-trained model and fine-tune to the target domain.

The method of this paper is mainly divided into two parts. The first part is the generative-based question-answer pairs generation. Using the answer-aware method, first parse out noun phrases and verb-related sentences from the text, and add the answer type to the input for fine-tuning the BART model. Then, the question-answer pairs is sorted by the DistilBERT model. The answer type can make the quality of the question-answer pairs better and the number of them will increase. And we also analyzed the performance of human evaluation on question-answer pairs, and used the evaluation metric ROUGE-L that used in question answering for evaluating question-answer pairs. It is found that its relevance is higher than the ranking score, which can be used as a question-answer pairs filtering method. The second is template-based generation: using the pipeline method: extracting the entities, forming pairs of entity and then inputting to ALBERT based relation extraction model for prediction. And then uses the extracted relations as a element of template-based question generation.

關鍵字(中)

★ 問題-答案配對生成
★ 對話問答
★ 資訊擷取

關鍵字(英)

★ Question-Answer Pairs Generation
★ Question Answering
★ Information Extraction

論文目次

中文摘要…i
英文摘要…ii
目錄…iii
圖目錄…v
表目錄…vi
一、介紹…1
1.1目標及挑戰…2
1.2貢獻…3
二、相關研究…5
2.1問題生成(Question Generation)…5
2.2問題回答(Question Answering)…6
2.3資訊擷取(Information Extraction)…7
2.3.1實體擷取(Entity Extraction)…8
2.3.2關係擷取(Relation Extraction)…9
2.3.3事件擷取(Event Extraction)…12
三、Generative問題生成方法…14
3.1問題定義…14
3.2模型架構…14
3.3實驗…16
3.3.1Testing on Given Answers…17
3.3.2Testing on Heuristic Answers…19
3.3.3Testing on 24 Life Educational Story Books…20
3.3.4使用問題回答進行評估…22
3.3.5人工評估…25
3.4Case Study…27
四、Extraction Enhanced問題生成…29
4.1模板式問題生成方法…29
4.1.1Relation…29
4.1.2Event…31
4.1.3評估…34
4.2實體與關係擷取…35
4.2.1Dataset: ACE 2005 Multilingual Training Corpus…35
4.2.2實驗…36
五、結論…39
參考文獻…40

參考文獻

[1]Zexuan Zhong and Danqi Chen. A frustratingly easy approach for entity and relation extraction. In North American Association for Computational Linguistics (NAACL), 2021.
[2]Yu-Ming Shang, Heyan Huang, and Xianling Mao. Onerel: Joint entity and relation extraction with one module in one step. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol- ume 36, pages 11285–11293, 2022.
[3]Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Li, Nora Bradford, Branda Sun, Tran Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, and Mark Warschauer. Fantastic questions and where to ﬁnd them: FairytaleQA – an authentic dataset for narrative comprehension. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 447–460, Dublin, Ireland, May 2022. Association for Computational Linguistics.
[4]Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. Ace 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, 57:45, 2006.
[5]Alison H Paris and Scott G Paris. Assessing narrative comprehen- sion in young children. Reading Research Quarterly, 38(1):36–76, 2003.
[6]Rubel Das, Antariksha Ray, Souvik Mondal, and Dipankar Das. A rule based question generation framework to deal with simple and complex sentences. In 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 542–548. IEEE, 2016.
[7]Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, and Ming Zhou. Neural question generation from text: A preliminary study. In National CCF Conference on Natural Language Processing and Chinese Computing, pages 662–671. Springer, 2017.
[8]Xinya Du, Junru Shao, and Claire Cardie. Learning to ask: Neu- ral question generation for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1342–1352, Vancouver, Canada, July 2017. Association for Computational Linguistics.
[9]Chin-Yew Lin. Rouge: A package for automatic evaluation of sum- maries. In Text summarization branches out, pages 74–81, 2004.
[10]Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
[11]Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas, November 2016. Association for Computational Linguistics.
[12]Rajarshi Das, Manzil Zaheer, Siva Reddy, and Andrew McCallum. Question answering on knowledge bases and text using universal schema and memory networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 358–365, Vancouver, Canada, July 2017. Association for Computational Linguistics.
[13]Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. Knowl- edge graph embedding based question answering. In Proceedings of the twelfth ACM international conference on web search and data mining, pages 105–113, 2019.
[14]Ellen Riloﬀ and Michael Thelen. A rule-based question answer- ing system for reading comprehension tests. In ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems, 2000.
[15]Xiangyang Mou, Chenghao Yang, Mo Yu, Bingsheng Yao, Xiaox- iao Guo, Saloni Potdar, and Hui Su. Narrative question answering with cutting-edge open-domain QA techniques: A comprehensive study. Transactions of the Association for Computational Linguis- tics, 9:1032–1046, 2021.
[16]Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. Named entity recognition with context-aware dictionary knowledge. In Pro- ceedings of the 19th Chinese National Conference on Computational Linguistics, pages 915–926, Haikou, China, October 2020. Chinese Information Processing Society of China.
[17]Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1):50–70, 2020.
[18]Lawrence R Rabiner. A tutorial on hidden markov models and se- lected applications in speech recognition. Readings in speech recog- nition, pages 267–296, 1990.
[19]Andrew McCallum, Dayne Freitag, and Fernando C. N. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Con- ference on Machine Learning, ICML ’00, page 591598, San Fran- cisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[20]John D. Laﬀerty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth Interna- tional Conference on Machine Learning, ICML ’01, page 282289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[21]Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Ko- ray Kavukcuoglu, and Pavel Kuksa. Natural language process-ing (almost) from scratch. Journal of machine learning research, 12(ARTICLE):2493–2537, 2011.
[22]Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. CoRR, abs/1508.01991, 2015.
[23]Zhenjin Dai, Xutao Wang, Pin Ni, Yuming Li, Gangmin Li, and Xuming Bai. Named entity recognition using bert bilstm crf for chi- nese electronic health records. In 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and In- formatics (CISP-BMEI), pages 1–5, 2019.
[24]Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu. Relation classiﬁcation via multi-level attention CNNs. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 1298–1307, Berlin, Germany, August 2016. Association for Computational Linguistics.
[25]David Wadden, Ulme Wennberg, Yi Luan, and Hannaneh Ha- jishirzi. Entity, relation, and event extraction with contextualized span representations. In Proceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing and the 9th Interna- tional Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 5784–5789, Hong Kong, China, November 2019. Association for Computational Linguistics.
[26]Makoto Miwa and Mohit Bansal. End-to-end relation extraction us- ing LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 1105–1116, Berlin, Germany, August 2016. Association for Computational Linguistics.
[27]Suncong Zheng, Yuexing Hao, Dongyuan Lu, Hongyun Bao, Ji- aming Xu, Hongwei Hao, and Bo Xu. Joint entity and relation ex- traction based on a hybrid neural network. Neurocomputing, 257:59– 66, 2017.
[28]Kui Xue, Yangming Zhou, Zhiyuan Ma, Tong Ruan, Huanhuan Zhang, and Ping He. Fine-tuning bert for joint entity and relation extraction in chinese medical text. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 892– 897. IEEE, 2019.
[29]Wei Xiang and Bang Wang. A survey of event extraction from text.
IEEE Access, 7:173111–173137, 2019.
[30]Yaojie Lu, Hongyu Lin, Jin Xu, Xianpei Han, Jialong Tang, Annan Li, Le Sun, Meng Liao, and Shaoyi Chen. Text2Event: Controllable sequence-to-structure generation for end-to-end event extraction. In Proceedings of the 59th Annual Meeting of the Association for Com- putational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2795–2806, Online, August 2021. Association for Computational Linguistics.
[31]Colin Raﬀel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a uniﬁed text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
[32]Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. As- sociation for Computational Linguistics.
[33]Bingsheng Yao, Dakuo Wang, Tongshuang Wu, Zheng Zhang, Toby Li, Mo Yu, and Ying Xu. It is AI’s turn to ask humans a question: Question-answer pair generation for children’s story books. In Pro- ceedings of the 60th Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), pages 731–744, Dublin, Ireland, May 2022. Association for Computational Linguistics.
[34]Matthew Honnibal and Ines Montani. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural net- works and incremental parsing. To appear, 2017.
[35]Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. AllenNLP: A deep semantic natural language process- ing platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), pages 1–6, Melbourne, Australia, July 2018. Association for Computational Linguistics.
[36]Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
[37]Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gim- pel, Piyush Sharma, and Radu Soricut. ALBERT: A lite BERT for self-supervised learning of language representations. CoRR, abs/1909.11942, 2019.
[38]Deming Ye, Yankai Lin, Peng Li, and Maosong Sun. Packed levi- tated marker for entity and relation extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 4904–4917, Dublin, Ireland, May 2022. Association for Computational Linguistics.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2022-7-28

推文