dc.description.abstract | Question generation technology has significant applications in various fields such as education and automated question-answering systems. Its primary purpose is to automatically generate questions and answers from text, helping teachers assess students′ understanding and providing personalized learning to enhance learner engagement and learning outcomes. Previous research has focused mainly on text filtering and answer generation, but challenges remain in controlling question difficulty. For example, traditional methods primarily rely on answers and text range to generate questions, making it difficult to effectively control the difficulty of the questions.
This study proposes a method to control question generation by extracting event and relation to improve the quality and diversity of question generation. The original FairytaleQA dataset only includes text, questions, and answers, lacking additional event and relation information. Therefore, we invited annotators to label the FairytaleQA dataset, marking relevant events and relation information for the questions and answers in the dataset, thereby enriching the dataset.
Next, we trained the model using this annotated data and generated questions of varying difficulty based on the following two rules: (1) Replacing pronouns in event parameters, by providing information on subjects and pronouns, enabling the model to understand the meaning of pronouns in event extraction and thereby generate corresponding questions; (2) Linking events in relation extraction, by linking multiple events in relation extraction, allowing the model to know more contextual information, thus generating cross-paragraph questions. The research results show that when we provide the model with more events, the difficulty of the generated questions significantly increases, with the proportion of difficult questions rising from 39% to 45%. Additionally, when the paragraphs of subjects and pronouns are further apart or when two events are further apart, the difficulty of the generated questions also significantly increases, with the proportion of difficult questions rising from 33% to 45%. | en_US |