摘要: | 對話系統的開發在這幾年成為了熱門的研究項目,許多公司都有這方面的需求。對話系統依據目的可分成兩類,一、目標導向型的對話系統,如:客服,解答客戶對於特定領域的問題,又或個人助理Siri,可整合資訊(手機通訊錄、天氣、行事曆、時間...等相關資訊),並提供諮詢;二、非目標導向型的對話系統,如:以陪伴為主要目的的機器人Alice,進行簡單的聊天對話。我們的研究著重於後者,目的在回應使用者說的話,使用者說的可能是問句、抱怨、感嘆、陳述事實,諸如此類的句子,聊天機器人該如何回答才能深得人心,是本篇論文想探討的重點。 短文本對話系統可分為兩大類:Retrieval-based、Generative-based。前者作法依賴資料庫的質量,後者則另需有文法檢查模組。本文希望解決Generative-based STC問題,但採取Retrieval-based為基底,將網路作為資料庫,檢索來自谷歌摘要的候選句子作為回應,因此無需事先收集大量文本豐富資料庫。作法包含:一、查詢關鍵字的產生;二、斷句及候選句子的前置處理;三、SVMrank排序句子。 我們使用貼文作為關鍵字,谷歌摘要能幫我們挑出匹配字串的文本,若是轉貼文,將會匹配到較長的字串,而句子與句子之間通常具有關聯性,因此將匹配字串後續的句子,作為候選回應文本,確保在回應是具有語意相關的。若是原創貼文,匹配的字串較為零碎,因此我們將匹配的部分挑選出名詞以及能在維基上找到的詞,我們認為這些詞能作為貼文中的重點,並以此作為新關鍵字,獲取新文本。至於抽取文本中的句子,我們則是使用結尾標點符號斷句,以及CKIP剖析器分析句型,去除不完整的對稱符號,這些動作皆是為了確保句子的完整性。而候選句子則使用我們訓練過的SVMrank模型,排序出與貼文最相關的句子。 實驗採用人工標記,以NTCIR STC2的4項回應評估標準,人工評估100則貼文的回應,平均分數為0.713,Kappa值為0.479。 ;The development of the dialogue system has become a hot research project in recent years, many companies have this demand. The dialogue system can be divided into two categories according to the purpose. First, the task-oriented dialogue system, such as: customer service, to answer customer questions for specific areas, or personal assistant Siri, can integrate information (mobile phone address book, weather, calendar, time ... and other related information), and supply enquiry; Second, non-task-oriented dialogue system, such as: to accompany the main purpose of the robot Alice, a simple chat dialogue. Our research focuses on the latter. The purpose is to respond to the user′s words. The user′s sentence may be a question, complain, sigh, facts, and so on. chatbots how to answer is key point in this paper. Short text conversation system can be divided into two categories: Retrieval-based、Generative-based. The former approach depends on the quality of the database, the latter is required to have a grammar check module. In this paper, we hope to solve the problem of Generative-based STC, but adopt Retrieval-based as the base, and use the network as a database to retrieve candidate sentences from Google Abstract, so we do not need to collect a large number of text-rich databases in advance. Practice includes: First, the query keyword generation; Second, the punctuation and candidate sentences of the pre-processing; Third, SVMrank sort sentences. We use the NTCIR STC2 response evaluation criteria. 3 non-expert evaluate responses of 100 posts. The average score is 0.713. |