模仿重述一則故事是一種培養學生敘事力的方法,但對於記憶力較差或是無法自己完成描述一個故事的學生來說,這也可能帶來一些困難。因此,我們希望利用自然語言處理技術,開發一款故事共述對話模組,該模組能與學生共述一則英語故事,藉此培養學生的敘事能力。然而,故事共述是一項相對較少人涉及且相對新穎的任務。其次,目前也沒有現成的故事共述對話語料集可供使用,若要求對話機器人從實際與學生互動中學習,可能會相當耗費時間與金錢成本,這使得我們需要使用機器對機器方法結合強化學習來生成相應的資料集;而缺乏強化式學習中所需的奬勵函數,也是系統設計的挑戰。 在故事共述中,模型需具備兩大能力:(1) 理解故事的內容,以掌握故事劇情和資訊;(2) 根據目前對話討論其餘故事相關劇情。我們採用開放領域資訊擷取技術來建構知識圖譜,故事知識圖譜不僅可以擷取重要資訊,還提供結構化的知識表示,有助於模型理解和組織故事資訊。同時,我們使用多代理人強化學習方法,讓兩個代理人根據對話歷史從知識圖譜中選擇相關的事實來生成回覆,並共同完成故事共述的任務。基於這些能力,對話模組可以在故事共述過程中有效引入故事元素,例如當用戶提到一個特定的情節或角色時,模型可以進一步展開故事情節,提供相關背景和發展。 透過強化學習方法,我們能根據目前的對話歷史與候選回覆中,做出更明智的選擇。相較於僅依照時間順序回覆,我們的模型經由自我訓練的獎勵評估,性能從67.01% 提升至70.81%,上升了約3.8%。;We aim to develop a dialogue module for story co-telling using natural language processing techniques to help students improve their narrative abilities. However, this task is relatively less explored and lacks readily available dialogue datasets. To overcome this, we adopt a machine-to-machine approach with reinforcement learning to generate the dataset, although the absence of a reward function presents a design challenge. In story co-telling, the model needs two main capabilities: (1) understanding the story content and (2) discussing relevant plot points based on the ongoing conversation. We use open-domain information retrieval to create a knowledge graph for the story, which captures essential information and helps the model comprehend and organize the story details. Using multi-agent reinforcement learning, two agents select relevant facts from the knowledge graph based on the conversation history to generate responses and complete the story co-telling task together. This enables the dialogue module to effectively introduce story elements during the co-telling process, like providing background and progression when the user mentions specific plots or characters. Through reinforcement learning, we can make more informed choices based on the current conversation history and candidate responses. Compared to merely responding based on chronological order, our model′s performance improved from 67.01% to 70.81% through self-training with reward evaluation, resulting in an approximately 3.8% increase.