發展少量特徵擷取方法之問題分類技術

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：42

、訪客IP：18.226.170.68

姓名

曾增仁(Tseng-Jen Tseng) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

發展少量特徵擷取方法之問題分類技術
(A Method to Extract Fewer Features for Question Classification)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

現今使用者利用問題回答系統 (question answering system) 進行資訊檢索時，通常期望在其查詢問題的當中得到一個確切的答案；而非傳統的檢索系統一般，回應一連串相關的文件列表。在問題回答系統的架構之中，系統在回答問題之前必須先進行問題的分類，以便了解問題的義涵。而問題分類也是問題回答系統裡處理程序之中最易出現錯誤的模組。以機器學習導向來說，問題分類與文件分類是兩個相似的程序。因此，特徵擷取在問題分類的處理之中是相當重要的任務。傳統特徵擷取的方法是依賴成百上千甚至更多的特徵，研究者在處理大量的特徵面臨了許多的問題。因此，本篇研究發展一個新的特徵擷取方法，試圖以少量的特徵擷取用於機器學習的分類器。在實驗當中，我們使用統計顯著性檢定來判別每一種不同特徵對於分類器效能的影響。實驗發現我們所擷取的特徵與一般常使用的bag-of-words 特徵表現一樣好。而在小型訓練資料集當中，我們所擷取的特徵也跟bag-of-ngrams 特徵的表現一樣好。

摘要(英)

Today, some users usually prefer to receive answers in response to their questions by a question answering (QA) system, as opposed to the document lists returned by information retrieval (IR) system. In the architecture of a QA system, question classification is needed to extract the meaning of a question for answering the question. It causes most errors in the procedure of QA system. And question classification is very similar to text classification in machine learning approach. Therefore, the one of its important issues is to extract effective
features. Traditional feature extraction depends on thousands or more features. Researches have problems in handling a large-dimension feature vectors. In view of this, this study is aimed to define a small number of features for machine learning classifiers. In our experiment, we test the efficacy of each feature with statistical significant test. We discover that our features are as good as bag-of-words feature. In small training dataset, our features are as good as bag-of-ngrams
feature.

關鍵字(中)

★ 文件分類
★ 問題分類
★ 問答系統
★ 特徵擷取
★ 機器學習

關鍵字(英)

★ text classification
★ question classification
★ question answering system
★ machine learning
★ feature extraction

論文目次

Index.................................................... i
Figure Index................................................... ii
Table Index ................................................. iii
1.Introduction .......................................... 1
2. Question Classification .......................................... 3
2.1 Question Taxonomy ................................... 4
2.2 Machine Learning Approach ........................... 5
2.3 Handcrafted Rules ................................... 6
2.4 Using Internet ...................................... 7
3. Feature Extraction ................................... 9
3.1 Category Frequency................................... 9
3.2 Category Frequency for Question Classification ..... 11
4. Experiment........................................... 14
4.1 Data ............................................... 14
4.2 Evaluation ......................................... 14
4.3 Experimental Results ............................... 15
4.4 Discussion ......................................... 17
5. Conclusion and Future Works ......................... 18
Reference .............................................. 19

參考文獻

[1] A. Singhal, S. Abney, M. Bacchiani, M. Collins, D. Hindle, and F. Pereira, “AT&T at
TREC-8”, Proceedings of the Eighth Text Retrieval Conference (TREC-8), pp. 500-246,
2000.
[2] C. Cumby and D. Roth, “Relational representations that facilitate learning”, Proc.of the
International Conference on the Principles of Knowledge Representation and Reasoning, pp.
425-434, 2000.
[3] D. Moldovan, M. Pasca, S. Harabagiu, and M. Surdeanu,“ Performance issues and error
analysis in an open-domain Question Answering system”, Proceedings of the 40th Annual
Meeting on Association for Computational Linguistics, pp. 33-40, 2001.
[4] D. Zhang and W. S. Lee, “Question classification using support vector machines”,
Proceedings of the 26th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, pp. 26-32, 2003.
[5] E. M. Voorhees, “Overview of the TREC 2001 question answering track”, Proceedings of
TREC, pp. 42-51, 2002.
[6] E. M. Voorhees, “The TREC-8 Question Answering Track Report”, Proceedings of TREC,
vol. 8, pp. 77-82, 1999.
[7] E. Voorhees and D. Tice, “The TREC-8 question answering track evaluation”, Text
Retrieval Conference TREC, vol. 8, 2000.
[8] K. S. Jones, “A statistical interpretation of term specificity and its application in retrieval”,
Journal of Documentation, vol. 28, pp. 11-21, 1972.
20
[9] O. Ferret, B. Grau, M. Hurault-Plantet, G. Illouz, L. Monceaux, I. Robba, and A. Vilnat,
“Finding an answer based on the recognition of the question focus”, Proceedings of the
Tenth Text Retrieval Conference(TREC 2001), pp. 500-250, 2002.
[10] P. Parveen and B. Thuraisingham, “Face Recognition Using Multiple Classifiers”,
Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence,
pp. 179-186, 2006.
[11] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley
Harlow, England, 1999.
[12] S. M. Harabagiu, M. A. Pasca, and S. J. Maiorano, “Experiments with open-domain textual
Question Answering”, Proceedings of the 18th Conference on Computational
Linguistics-Volume 1, pp. 292-298, 2000.
[13] T. Solorio, M. Perez-Coutino, M. M. y Gomez, L. Villasenor-Pineda, and A. Lopez-Lopez,
“A language independent method for question classification”, Proc.of the 20th Int.Conf.on
Computational Linguistics (COLING-04).Geneva, Switzerland, 2004.
[14] W. B. Cavnar and J. M. Trenkle, “N-Gram-Based Text Categorization”, Ann Arbor MI, vol.
48113, pp. 4001, 1994.
[15] X. Li and D. Roth, “Learning question classifiers”, Proceedings of the 19th International
Conference on Computational Linguistics, pp. 556-562, 2002.
[16] Y. Yang and X. Liu, “A re-examination of text categorization methods”, Proceedings of the
22nd Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, pp. 42-49, 1999.

指導教授

周世傑(Shihchieh Chou)

審核日期

2008-7-22

推文