博碩士論文 104522042 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:17 、訪客IP:54.159.85.193
姓名 陳暐翰(Wei-Han Chen)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於網路句子篩選之短文本對話
(Short-Text Conversation via Web Sentence Selection)
相關論文
★ 行程邀約郵件的辨識與不規則時間擷取之研究★ 網際網路半結構性資料擷取系統之設計與實作
★ 非簡單瀏覽路徑之探勘與應用★ 遞增資料關聯式規則探勘之改進
★ 應用卡方獨立性檢定於關連式分類問題★ 中文資料擷取系統之設計與研究
★ 非數值型資料視覺化與兼具主客觀的分群★ 關聯性字組在文件摘要上的探討
★ 淨化網頁:網頁區塊化以及資料區域擷取★ 問題答覆系統使用語句分類排序方式之設計與研究
★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘★ 星狀座標之軸排列於群聚視覺化之應用
★ 由瀏覽歷程自動產生網頁抓取程式之研究★ 動態網頁之樣版與資料分析研究
★ 同性質網頁資料整合之自動化研究★ 時序性資料庫中未知週期之非同步週期性樣板的探勘
檔案 [Endnote RIS 格式]    [Bibtex 格式]    至系統瀏覽論文 (2019-8-1以後開放)
摘要(中) 對話系統的開發在這幾年成為了熱門的研究項目,許多公司都有這方面的需求。對話系統依據目的可分成兩類,一、目標導向型的對話系統,如:客服,解答客戶對於特定領域的問題,又或個人助理Siri,可整合資訊(手機通訊錄、天氣、行事曆、時間...等相關資訊),並提供諮詢;二、非目標導向型的對話系統,如:以陪伴為主要目的的機器人Alice,進行簡單的聊天對話。我們的研究著重於後者,目的在回應使用者說的話,使用者說的可能是問句、抱怨、感嘆、陳述事實,諸如此類的句子,聊天機器人該如何回答才能深得人心,是本篇論文想探討的重點。
短文本對話系統可分為兩大類:Retrieval-based、Generative-based。前者作法依賴資料庫的質量,後者則另需有文法檢查模組。本文希望解決Generative-based STC問題,但採取Retrieval-based為基底,將網路作為資料庫,檢索來自谷歌摘要的候選句子作為回應,因此無需事先收集大量文本豐富資料庫。作法包含:一、查詢關鍵字的產生;二、斷句及候選句子的前置處理;三、SVMrank排序句子。
我們使用貼文作為關鍵字,谷歌摘要能幫我們挑出匹配字串的文本,若是轉貼文,將會匹配到較長的字串,而句子與句子之間通常具有關聯性,因此將匹配字串後續的句子,作為候選回應文本,確保在回應是具有語意相關的。若是原創貼文,匹配的字串較為零碎,因此我們將匹配的部分挑選出名詞以及能在維基上找到的詞,我們認為這些詞能作為貼文中的重點,並以此作為新關鍵字,獲取新文本。至於抽取文本中的句子,我們則是使用結尾標點符號斷句,以及CKIP剖析器分析句型,去除不完整的對稱符號,這些動作皆是為了確保句子的完整性。而候選句子則使用我們訓練過的SVMrank模型,排序出與貼文最相關的句子。
實驗採用人工標記,以NTCIR STC2的4項回應評估標準,人工評估100則貼文的回應,平均分數為0.713,Kappa值為0.479。
摘要(英)
The development of the dialogue system has become a hot research project in recent years, many companies have this demand. The dialogue system can be divided into two categories according to the purpose. First, the task-oriented dialogue system, such as: customer service, to answer customer questions for specific areas, or personal assistant Siri, can integrate information (mobile phone address book, weather, calendar, time ... and other related information), and supply enquiry; Second, non-task-oriented dialogue system, such as: to accompany the main purpose of the robot Alice, a simple chat dialogue. Our research focuses on the latter. The purpose is to respond to the user′s words. The user′s sentence may be a question, complain, sigh, facts, and so on. chatbots how to answer is key point in this paper.
Short text conversation system can be divided into two categories: Retrieval-based、Generative-based. The former approach depends on the quality of the database, the latter is required to have a grammar check module. In this paper, we hope to solve the problem of Generative-based STC, but adopt Retrieval-based as the base, and use the network as a database to retrieve candidate sentences from Google Abstract, so we do not need to collect a large number of text-rich databases in advance. Practice includes: First, the query keyword generation; Second, the punctuation and candidate sentences of the pre-processing; Third, SVMrank sort sentences.
We use the NTCIR STC2 response evaluation criteria. 3 non-expert evaluate responses of 100 posts. The average score is 0.713.
關鍵字(中) ★ 自然語言處理
★ 對話系統
關鍵字(英)
論文目次
中文摘要 ii
英文摘要 iii
圖目錄 v
表目錄 vi
I. 緒論 1
II. 相關研究 4
2.1 聊天機器人作法 5
2.1.1 Template-Driven 6
2.1.2 Retrieval-Based 7
2.1.3 Generative-Based 7
2.2 自然語言處理 8
2.2.1 自然語言處理相關工具 8
2.2.2 廣義知網 8
2.2.3 詞嵌入 9
III. 系統架構 10
3.1. 貼文分類 10
3.2. 文本處理 12
3.3. 特徵設計 13
IV. 實驗與系統效能 20
4.1 SVMrank評估與二元特徵挑選 20
4.2 回應結果評估 22
V. 結論與未來工作 26
VI. 參考文獻 27
參考文獻
[1] Kristiina Jokinen and Michael McTear. Spoken Dialogue Systems Chapters 2.1.2, 2.2, 4, 5.1. Morgan & Claypool Publishers, 2010.
[2] D. Goddeau, H. Meng, J. Polifroni, S. Seneff, and S. Busayapongchai. A form-based dialogue manager for spoken language applications. In Proc. ICSLP, pp. 701—704, 1996.
[3] Xu, W. and Rudnicky, A. Task-based dialog management using an agenda. ANLP/NAACL 2000 Workshop on Conversational Systems, pp. 42-47, May 2000.
[4] Colin Matheson, Massimo Poesio, and David Traum, Modelling Grounding and Discourse Obligations Using Update Rules, in Proceedings of the 1st Annual Meeting of the North American Association for Computational Linguistics (NAACL2000), May 2000.
[5] David Traum and Staffan Larsson, The Information State Approach to Dialogue Management in Current and New Directions in Discourse and Dialogue, Ed. Jan van Kuppevelt and Ronnie Smith, Kluwer, pp 325-354, 2003.
[6] Zongcheng Ji, Zhengdong Lu, Hang Li, An Information Retrieval Approach to Short Text Conversation, 2014
[7] Wu, W., Lu, Z., Li, H., Jan. Learning bilinear model for matching queries and documents. Journal of Machine Learning Research 14 (1), 2519–2548, 2013.
[8] Xue, X., Jeon, J., Croft, W. B. Retrieval models for question and answer archives. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’08. ACM, New York, NY, USA, pp. 475–482. 2008.
[9] Lu, Z., Li, H.. A deep architecture for matching short texts. In: Advances in Neural Information Processing Systems. pp. 1367–1375. 2013.
[10] Anton Leuski and David Traum. NPCEditor: Creating virtual human dialogue using information retrieval techniques. AI Magazine, 32(2):42–56. 2011.
[11] N. Roy, J. Pineau, and S. Thrun. Spoken Dialog Management Using Probabilistic Reasoning . In Proceedings of ACL. 2000.
[12] D. Litman, S. Singh, M. Kearns, and M. Walker. NJFun: A Reinforcement Learning Spoken Dialogue System. In Proceedings of NAACL. 2000.
[13] Shang, L., Lu, Z., and Li, H. Neural responding machine for short-text conversation. In Proceedings of ACL, 2015.
[14] A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J.-Y. Nie, J. Gao, B. Dolan. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. In Proc. of NAACL-HLT. Pages 196-205. 2015.
[15] CKIP Chinese Parser, http://parser.iis.sinica.edu.tw/
[16] E-HowNet, http://ehownet.iis.sinica.edu.tw/ehownet.php
[17] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
[18] Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychological Bulletin, Vol. 76, No. 5 pp. 378–382, 1971.
指導教授 張嘉惠 審核日期 2017-8-24
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明