利用相關回饋資訊以提升文件分類之效能

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：40

、訪客IP：3.133.144.122

姓名

吳克能(Ke-neng Wu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

利用相關回饋資訊以提升文件分類之效能
(Applying Relevance Feedback to Improving Text Classification Performance)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

隨著網際網路的快速發展，網路資訊爆炸(Information explosion)使得可存取的資訊量愈來愈多。資訊檢索系統在獲取資訊的過程中扮演很重要的角色，為了提升檢索的品質與滿足使用者的資訊需求，「文件分類」(Text classification)是一個重要的課題。本研究提出了一套方法，萃取相關回饋(Relevance feedback)的資訊建立使用者興趣檔(User profile)，並透過此使用者興趣檔對文件進行特徵選取(Feature selection)與字詞權重調整(Re-weighting)，其包含兩個概念：(1)使用者興趣檔代表了使用者正向與負向的興趣，文件只保留屬於此使用者興趣檔的維度以減少文件分類過程中雜訊之干擾。(2)字詞出現在使用者興趣檔或文件中的重要位置，則給予加權以增加相關文件與非相關文件特徵的差異性；文件特徵強化是字詞敏感度(term sensitivity)輔以半結構化資訊的應用。實驗結果證實，本研究的方法能夠有效地擷取相關回饋的資訊，輔助文件分類正確率的提升與大幅縮減至少一半以上的執行時間。

摘要(英)

With the rapid development of the Internet, the information explosion across the Internet offers access to an increasing amount of information. Information retrieval system is playing an important role in the information retrieval process. In order to improve the retrieval quality and provide information in line with users’ need, “text classification” is an important issue. The study proposes an approach extracting information of relevance feedback to construct user profile for feature selection and term weighting adjustment of documents, and this approach consists of two concepts: (1) The user profile represents positive and negative interests of user, and the documents preserve only the features belonging to the user profile for reducing the noise interference in text classification. (2) The terms appearing in the user profile or important position in document are weighted for increasing the characteristic difference between relevant and non-relevant documents. Characteristic enhancement of documents is the application of term sensitivity aided by semi-structured information. The results of the experiments show that the proposed approach can extract information of relevance feedback effectively. Not only improving the accuracy of text classification but also at least a half of processing time can be greatly reduced.

關鍵字(中)

★ 權重調整
★ 使用者興趣檔
★ 特徵選取
★ 文件分類
★ 相關回饋

關鍵字(英)

★ Feature selection
★ Re-weighting
★ User profile
★ Relevance feedback
★ Text classification

論文目次

第1章緒論1
1-1 研究動機1
1-2 研究目的1
1-3 研究範圍與限制3
1-4 論文架構4
第2章文獻探討5
2-1 向量空間模型5
2-2相關回饋9
2-3相關回饋資訊應用於文件分類之研究11
2-4支援向量機(SVM)13
2-5字詞敏感度15
第3章系統架構18
3-1 系統架構18
3-2 文件分析器20
3-3 維度選取器24
3-4 文件特徵建置器26
3-5 文件分類器28
第4章實驗分析30
4-1 實驗環境30
4-2 實驗資料集30
4-3 實驗評估指標33
4-4 實驗設計與流程34
4-5 實驗結果與分析36
第5章結論42
5-1 研究結論與貢獻42
5-2 未來研究方向43
參考文獻45

參考文獻

[1] F. Sebastiani, “Text categorization,” In Alessandro Zanasi (ed.), Text Mining and its Applications, WIT Press, pp. 109-129, 2005.
[2] D. Harman, “Relevance feedback revisited,” in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 1-10, June. 1992.
[3] G. Salton, and C. Buckley, “Improving retrieval performance by relevance feedback,” Journal of the American Society of Information Science, Vol. 41, No. 4, pp. 288-297, 1990.
[4] H. Kim, P. Howland, and H. Park, “Dimension Reduction in Text Classification with Support Vector Machines,” Journal of Machine Learning Research, Vol. 6, No. 1, pp. 37-53, Mar. 2003.
[5] I. Ruthven, and M. Lalmas, “A survey on the use of relevance feedback for information access systems,” Knowledge Engineering Review, Vol. 18, No. 2, pp. 95-145, 2003.
[6] G. Salton, A. Wang, and C. S. Yang, “A Vector Space Model for Automatic Indexing,” Communication of the ACM, Vol. 18, No. 11, pp. 613-620, 1975.
[7] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, Vol. 24, No. 5, pp. 513-523, 1988.
[8] G. Salton, and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, pp. 117-122, 1983.
[9] Y. Hijikata, “Implicit User Profiling for On Demand Relevance Feedback,” in Proceedings of ACM Intelligent User Interface Conference, pp. 198-205, January 2004.
[10] M. Mitra, A. Singhal, and C. Buckley, “Improving Automatic Query Expansion,” in Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 206-214, 1998.
[11] J. J. Rocchio, “Document retrieval systems - optimization and evaluation,” Ph.D. thesis, Harvard Computational Laboratory, Harvard University, Cambridge, 1966.
[12] S. E. Robertson, and K. S. Jones, “Relevance Weighting of search terms,”
Journal of the American Society for Information Science, Vol. 27, No. 3, pp. 129-146, 1976.
[13] M. Dillon, and J. Desper, “Automatic Relevance Feedback in Boolean Retrieval
System,” Journal of Documentation, Vol. 36, pp. 197-208, 1980.
[14] B. Baharudin, L. H. Lee, and K. Khan, “A Review of Machine Learning Algorithms for Text-Documents Classification,” Journal of Advances in Information Technology, Vol. 1, No. 1, pp. 4-20, Feb. 2010.
[15] Z. Chen, and Y. Lu, “Using Text Classification Method in Relevance Feedback,” in Proceedings of the Second international conference on Intelligent information and database systems, pp. 441-449, 2010.
[16] Jonathan L. Elsas, Pinar Donmez, Jaime Callan, and Jaime G. Carbonell, “Pairwise Document Classification for Relevance Feedback,” in Proceedings of the 2009 Text REtrieval Conference (TREC 2009), 2009.
[17] T. Onoda, H. Murata, and S. Yamada, “One Class Classification Methods Based Non-Relevance Feedback Document Retrieval,” in Proceeding IAT Workshops, pp. 393-396, 2006.
[18] T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” in Proceeding ICML, pp. 143-151, 1997.
[19] B. Liu, X. Li, W. S. Lee, and P. S. Yu, “Text Classification by Labeling Words,” in Proceedings of The Nineteenth National Conference on Artificial Intelligence, pp. 425-430, 2004.
[20] H. Drucker, B. Shahary, and D. Gibbon, “Relevance Feedback using Support Vector Machines,” in Proceedings of the 18th International Conference on Machine Learning (ICML), pp. 122-129, June. 2001.
[21] H. Drucker, B. Shahraray, and D. C. Gibbon, “Support vector machines: relevance feedback and information retrieval,” Information Processing & Management, Vol. 38, No. 3, pp. 305-323, 2002.
[22] Y. Li, A. Algarni, and N. Zhong, “Personalized web search by mapping user queries to categories,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 753-762, 2010.
[23] M. Okabe, and S. Yamada, “Learning filtering rulesets for ranking refinement in relevance feedback,” Knowledge - Based Systems, 18 (2-3), pp. 117-124, 2005.
[24] L. Zhang, and Yi Zhang, “Filtering Semi-Structured Documents Based on Faceted Feedback,” in Proceedings of the 34th ACM SIGIR Conference, 2011.
[25] C. Cortes, and V. Vapnik, “Support-vector networks,” Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995.
[26] V. Vapnik, Structure of statistical learning theory, Computational Learning and Probabilistic Reasoning, John Wiely, 1996.
[27] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms, Kluwer academic Publishers, Vol. 29, No. 4, pp. 656-664, 2002.
[28] S. Tong, and E. Chang, “Support vector machine active learning for image retrieval,” in Proceedings of the 9th ACM international conference on Multimedia, pp.107-118, September 2000.
[29] A. Sung, and S. Mukkamala, “Identifying important features for intrusion detection using support vector machines and neural networks,” in Proceedings of the 2003 International Symposium on Applications and the Internet Technology, pp. 209-216, January 2003.
[30] F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[31] S. Chou, and W. Chang, “CyberIR － A Technological Approach to Fight Cybercrime,” Lecture Notes in Computer Science, vol. 5075, pp. 32-43, 2008.
[32] M. F. Porter, “An algorithm for suffix stripping,” Program: Electronic Library & Information Systems, vol. 40, no. 3, pp. 211-218, 1980.
[33] V. Fresno, and A. Ribeiro, “An analytical approach to concept extraction in html environments,” Journal of Intelligent Information Systems, vol. 22, pp. 215-235, 2004.
[34] S. Doan, “A Fuzzy-Based Approach for Text Representation in Text Categorization,” in Proceedings of 14th International Conference on Fuzzy Systems, pp. 1008-1013, 2005.
[35] H. Zhang, Y. Ma, Q. Zhang, and P. Xie, “Study and design of chinese concept-based search engine,” in Proceedings of ISCIT, pp. 40-43, 2005.
[36] C. C. Chang, and C. J. Lin. (2001). LIBSVM: A Library for Support Vector Machines. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[37] M. Ellen, Voorhees and D. Harman, “Overview of the Sixth Text REtrieval Conference (TREC-6),” In The Sixth Text REtrieval Conference (TREC-6), pp. 1-24, 1997.

指導教授

周世傑(Shih-chieh Chou)

審核日期

2011-7-18

推文