問題答覆系統使用語句分類排序方式之設計與研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：14

、訪客IP：3.137.218.215

姓名

楊智宇(Zhi-Yui Yang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

問題答覆系統使用語句分類排序方式之設計與研究
(Ranking by Sentence Categorization for Question Answering Systems)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘	★ 星狀座標之軸排列於群聚視覺化之應用
★ 由瀏覽歷程自動產生網頁抓取程式之研究	★ 動態網頁之樣版與資料分析研究
★ 同性質網頁資料整合之自動化研究	★ 時序性資料庫中未知週期之非同步週期性樣板的探勘

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在資訊大量擴充與爆炸的今日，加上資訊種類的繁多與複雜，所以更是難以找尋正確與所需的資料。而利用資訊檢索(Information Retrieval)與資訊擷取(Information Extraction)的方法，我們便可以易於在大量的資料中檢索與擷取重要的資訊。
問題答覆答系統結合了資訊檢索與資訊擷取，在大量的文件中找尋問題相關的內文，進而擷取其答案。資訊尋找方式通常是利用資訊檢索的技術，但資訊檢索所得的資訊過於廣泛且雜訊過多，所以加上資訊擷取的方法，可以把資訊精簡。但單純的加入資訊擷取與資訊檢索，真正感興趣的部分還是無法得知，這時就需要專有名詞(Name Entity)辨識我們感興趣的部分，並加以擷取。一般的資訊檢索與資訊擷取無法直接套用在問題回答系統，原因是問題與答案的種類繁多，而且涉及自然語言的格式與方法，加上隨字彙語義、語法不同，語句的表示法也會不同，所以大部分問題答覆系統都需要進一步的問題分類(Question Classification)與段落擷取(Passage Retrieval)技巧，並加上人所觀察出的經驗法則(Heuristic)來解決問題與答案間的關連性。而人的因素牽涉越多，所花的成本也隨之增大。也由於人類相關的知識介入，所牽涉的領域很廣，很難用一個通則涵蓋所有範圍。
而本篇所要設計的問題回答系統，即是利用已知的資訊加上分類演算法來建立系統模組，模組會自動學習如何找尋問題的答案。此種機器學習(Machine Learning)的技巧能讓系統面對未來可利用的訓練資料時，更能學習到重要資訊，而不需複雜的人為介入造成時間、人力成本的增加。這種以分類為基礎的問題回答系統是第一次被嘗試，而實驗也證明了其獨特性與優越性。

摘要(英)

It is a world of information explosion nowadays. Due to the variety and the complexity of information, the accurate data becomes more difficult to search. Meanwhile, people may have tended to neglect some important information which appears shortly. By using Information Retrieval (IR) and Information Extraction (IE) techniques, it is beneficial for helping people to fetch accurate and important information within a large amount of databases more effectively.
A Question Answering System (QA system) combines both IR and IE techniques. It is able to search answers in documents of questions. Information Retrieval usually uses Document Retrieval to find the relevant documents, but the documents may have too much information and many noise. Hence, most QA Systems use question classification and passage retrieval to improve the system accuracy. Then, they use Name Entity to tag the proper noun they interested. Because QA systems involve linguistics studying, most of them use the observations of human efforts to create the relations between questions and answers. But more human efforts involve, more time and money spend.
This research of the QA System is designed to utilize the information that is already known. It includes classified questions and correct answer sentences. By adding Machine Learning techniques, our QA system integrates the information and classification-based methods. We can answer the question automatically without human efforts. It is the first time that QA systems use classification-based system architecture. And from our experiments, they prove that our QA system has its uniqueness and superiority.

關鍵字(中)

★ 問題答覆
★ 語句分類
★ 答案擷取
★ 特徵擷取
★ 問題分類
★ 文件檢索
★ 段落萃取

關鍵字(英)

★ passage retrieval
★ document retrieval
★ answer extraction
★ question classification
★ question answering
★ sentence categorization

論文目次

目錄 1
圖目錄 3
表目錄 4
1. 緒論 5
1.1 問題定義 6
1.2 研究動機與目的 8
1.3 論文結構 8
2. 相關研究 9
2.1 Keyword Matching Approach 10
2.1.1 NTU TREC-8 QA System 10
2.2.2 NTU TREC-9 QA System 11
2.2.3 NTU TREC-10 QA System 12
2.2 Template Approach 12
2.2.1 DLT TREC-11 QA System 12
2.2.2 Sheffied TREC-11 QA System 13
2.2.3 SUNY TREC-12 QA System 14
2.2.4 IBM TREC9&TREC12 QA System 14
2.2.5 PRIS TREC-11 QA System 15
2.3 ILP Approach 16
2.3.1 LCC TREC-10 QA System 16
2.4 相關系統比較 19
3 系統架構 20
3.1 問題分類系統 (Question Classification System) 21
3.1.1 問題關鍵字擷取與擴充 (Question Expansion) 21
3.1.2 問題分類 (Question Classification) 22
3.1.2.1 資料 (Data) 22
3.1.2.2 分類架構 (Classification Architecture) 23
3.1.2.3 特徵選取 (Feature Selection) 24
3.2 文件檢索系統 (Document Retrieval System) 25
3.2.1 索引 (Index) 25
3.2.2 查詢 (Query) 26
3.3 答案擷取系統 (Answer Extraction System) 26
3.3.1 段落萃取 (Passage Retrieval) 27
3.3.1.1 段落萃取演算法 (Passage Retrieval Algorithm) 27
3.3.1.2 特徵選取 (Feature Selection) 28
3.3.2 答案擷取 (Answer Extraction) 28
3.3.2.1 分類方法與架構 28
3.3.2.2 句子特徵選取 (Sentence Feature Selection) 29
4 實驗與討論 31
4.1 Question Classification Experiment 31
4.1.1 單一特徵實驗 31
4.1.2 複合特徵實驗 32
4.1.3 結果討論 34
4.2 Answer Extraction Experiment 35
4.2.1 TREC-8 Experiment Result 35
4.2.2 TREC-9 Experiment Result 36
4.2.3 TREC-10 Experiment Result 37
4.2.4 結果討論 38
5. 結論與未來展望 39
5.1 結論 39
5.2 未來展望 39
參考文獻 41

參考文獻

[1] E. Voorhees, “The TREC-8 Question Answering Track Report,” Proceedings of the Eighth Text Retrieval Conference, 77, 1999.
[2] E. Voorhees. “Overview of the TREC-9 Question Answering Track,” Proceedings of the Ninth Text Retrieval Conference, 71, 2000.
[3] E. Voorhees, “Overview of the TREC 2001 Question Answering Track,” Proceedings of the Tenth Text Retrieval Conference, 42, 2001.
[4] E. Voorhees, “Overview of the TREC 2002 Question Answering Track,” Proceedings of the Eleventh Text Retrieval Conference, 2002.
[5] E. Voorhees, “Overview of the TREC 2003 Question Answering Track,” Proceedings of the Twelfth Text Retrieval Conference, 2003.
[6] C. J. Lin and H. H. Chen, “Description of Preliminary Results to TREC-8 QA Task,” Proceedings of the Eighth Text Retrieval Conference, 1999.
[7] C. J. Lin and W. C. Lin, “Description of NTU QA and CLIR System in TREC-9,” Proceedings of the Ninth Text Retrieval Conference, 2000.
[8] C. J. Lin and H. H. Chen, “Description of NTU System at TREC-10 QA Track,” Proceedings of the Tenth Text Retrieval Conference, 406, 2001.
[9] R.F.E. Sutcliffe, “Question Answering Using the DLT System at TREC 2002,” Proceedings of the Eleventh Text Retrieval Conference, 2002.
[10] M.A. Greenwood, I. Roberts, and R. Gaizauskas, “The University of Sheffield TREC 2002 Q&A System,” Proceedings of the Eleventh Text Retrieval Conference, 2002.
[11] M. Wu, X. Zheng, M. Duan, T. Liu, and T. Strzalkowski, “Questioning Answering By Pattern Matching, Web-Proofing, Semantic Form Proofing,” Proceedings of the Twelfth Text Retrieval Conference, 2003.
[12] A. Ittycheriah, M. Franz, and S. Roukos, “IBM's Statistical Question Answering System--TREC-10,” Proceedings of the Tenth Text Retrieval Conference, 258, 2000.
[13] J. Prager, J. Chu-Carroll, K. Czuba, C. Welty, A. Ittycheriah, R. Mahindru, “IBM's PIQUANT in TREC2003,” Proceedings of the Twelfth Text Retrieval Conference, 283, 2003.
[14] H. Yang and T.-S. Chua, “The Integration of Lexical Knowledge and External Resources for Question Answering,” Proceedings of the Eleventh Text Retrieval Conference, 2002.
[15] D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lacatusu, A. Novischi, A. Badulescu, and O. Bolohan, “LCC Tools for Question Answering,” Proceedings of the Tenth Text Retrieval Conference, 2001
[16] X. Li and D. Roth, “Learning Question Classifiers,” Proceedings of the 19th International Conference on Computational Linguistics, 2002.
[17] G. Salton and C. Buckley, “Improving retrieval performance by relevance feedback,” Journal of the American Society for Information Science, 41(4):288-297, 1990.
[18] T. Kudo, Y. Matsumoto, “Use of Support Vector Learning for Chunk Identification,” Proceedings of CoNLL, 2000
[19] G. G. Lee, J. Seo, S. Lee, H. Jung, B.-H. Cho, C. Lee, B.-K. Kwak, J. Cha, D. Kim, J. An, H. Kim, and K. Kim, “SiteQ: Engineering high performance QA system using lexico-semantic pattern matching and shallow NLP,” Proceedings of the Tenth Text Retrieval Conference, 2001.
[20] S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton, “Quantitative evaluation of passage retrieval algorithms for question answering,” Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, 41–47, 2003.
[21] M. Pasca and S. Harabagiu, “High-Performance Question Answering,” 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 366-374, 2001.
[22] K. Hacioglu and W. Ward, “Question Classification with Support Vector Machines and Error Correcting Codes,” Proceedings of HLT-NAACL, 2003.
[23] D. Zhang and W. S. Lee, “Question Classification using Support Vector Machines,” Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, 26-32, 2003.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2004-7-14

推文