摘要: | 問題生成技術在教育和自動化問答系統等多個領域中具有重要應用。其主要目的是通過從文本中自動生成問題與答案,幫助教師檢視學生的理解程度,並提供個性化學習,以提升學習者的參與感和學習效果。過往研究多集中於文本過濾和答案生成,但在問題難度控制方面仍然存在挑戰。例如,傳統的方法主要依賴答案和文本範圍來生成問題,難以有效控制問題的難度。此外,現有的問題生成模型對於多跳推理和跨段落問題的生成能力較弱。
本研究提出了一種透過事件和關係擷取來控制問題生成的方法,旨在提升問題生成的品質與多樣性。原有的FairytaleQA資料集僅包含文本、問題與答案,缺乏額外的事件與關係資訊。因此,我們邀請標記人員對FairytaleQA資料集進行標註,為資料集中的問題與答案標記相關的事件與關係資訊,增加資料集的豐富程度。
接著,我們利用這些標註後的資料訓練模型,並基於以下兩項規則生成不同難易度的問題:(1)替換事件參數中的代名詞,通過提供主詞與代名詞的資訊,使模型能夠理解事件擷取中的代名詞代表的含義,從而生成相應的問題;(2)串聯關係擷取中的事件,通過串聯關係擷取中的多起事件,讓模型知曉更多的上下文資訊,由此可以生成出跨段落的問題。研究結果顯示,當我們給予模型更多的資訊時,所生成的問題難度也會顯著提升,困難問題的佔比從 39%提升至 45%。另外,當主詞與代名詞的段落相隔越遠或者當兩起資訊相隔越遠時,所生成的問題難度也顯著提升,困難問題的占比從 33% 提升至 45%。
本研究的方法不僅提高了問題生成的複雜性和挑戰性,還能更好地控制問題的難易度。這對於個性化教育應用和自動化問答系統具有潛在的重要意義,能夠有效提升學習者的成就感與參與感。;Question generation technology has significant applications in various fields such as education and automated question-answering systems. Its primary purpose is to automatically generate questions and answers from text, helping teachers assess students′ understanding and providing personalized learning to enhance learner engagement and learning outcomes. Previous research has focused mainly on text filtering and answer generation, but challenges remain in controlling question difficulty. For example, traditional methods primarily rely on answers and text range to generate questions, making it difficult to effectively control the difficulty of the questions.
This study proposes a method to control question generation by extracting event and relation to improve the quality and diversity of question generation. The original FairytaleQA dataset only includes text, questions, and answers, lacking additional event and relation information. Therefore, we invited annotators to label the FairytaleQA dataset, marking relevant events and relation information for the questions and answers in the dataset, thereby enriching the dataset.
Next, we trained the model using this annotated data and generated questions of varying difficulty based on the following two rules: (1) Replacing pronouns in event parameters, by providing information on subjects and pronouns, enabling the model to understand the meaning of pronouns in event extraction and thereby generate corresponding questions; (2) Linking events in relation extraction, by linking multiple events in relation extraction, allowing the model to know more contextual information, thus generating cross-paragraph questions. The research results show that when we provide the model with more events, the difficulty of the generated questions significantly increases, with the proportion of difficult questions rising from 39% to 45%. Additionally, when the paragraphs of subjects and pronouns are further apart or when two events are further apart, the difficulty of the generated questions also significantly increases, with the proportion of difficult questions rising from 33% to 45%. |