相關回饋資訊於概念化文件建立之應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：79

、訪客IP：3.137.218.83

姓名

吳浚瑞(Jun-rui Wu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

相關回饋資訊於概念化文件建立之應用
(Applying relevance feedback to construct a vector space model with concepts as the dimension value)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

網際網路的蓬勃發展造成資訊迅速膨脹，資訊檢索系統為了幫助使用者取得所需的資訊亦隨之發展。本研究提出了一套方法，能夠擷取並利用顯性相關回饋的資訊在概念萃取的機制當中，並利用萃取出的概念來表達文件，建立一個以概念為維度的向量空間模型，最後應用此模型在文件分類上，以提升文件分類的效能。實驗結果顯示本研究所提出的方法，分類效果都較傳統以字詞作為文件特徵時來得好；本研究中也實驗將亂數的相關回饋資訊應用在概念萃取的機制當中，結果發現其分類效果都較傳統以字詞作為文件特徵時來得差上許多，因此本研究證實，顯性相關回饋中，確實有資訊可應用於概念化文件建立以促進文件分類之效能。

摘要(英)

The rapid development of the Internet causes the problem of information explosion. To solve this problem, information retrieval system is developed to help user in the finding of the information of their needs. This study proposes an approach that extracts information from relevance feedback to construct a concept extraction algorithm. At first, extracts concepts from the document set, and uses these concepts as document’s attributes. Then, creates a vector space model with the extracted concepts as dimension value for the document. Finally, uses the proposed model to improve the performance of document classification. The result of experiments show that the proposed approach can perform better than term based vector model. This study also apply the information of random relevance feedback to construct the concept extraction algorithm. The results of the experiments show that the application of the information of random relevance feedback performs much worse than term based vector model. This study confirms that the application of the information of explicit relevance feedback to create a vector space model with the extracted concepts as dimension value for the document can improve the performance of document classification.

關鍵字(中)

★ 向量空間模型
★ 相關回饋
★ 概念萃取

關鍵字(英)

★ vector space model
★ relevance feedback
★ concept extraction

論文目次

一、緒論 1
1-1 研究動機 1
1-2 研究目的 1
1-3 研究範圍與限制 2
1-3-1 研究範圍 2
1-3-2 研究限制 2
1-4 論文架構 2
二、文獻探討 3
2-1 向量空間模型 3
2-2 分類相關研究 5
2-2-1 K-最鄰近鄰居 (KNN) 5
2-2-2 模糊集合論 5
2-2-3 支援向量機 (SVM) 6
2-3 相關回饋及其應用 7
2-3-1 相關回饋 7
2-3-2 查詢擴展 9
2-3-3 相關回饋相關應用 11
2-3-4 概念相關研究 13
三、系統架構 15
3-1 系統架構 15
3-2 文件分析器 17
3-3 文件特徵建置器 18
3-3-1 概念萃取 19
3-3-2 計算概念權重並作為文件特徵 21
3-4 文件分類器 23
四、實驗分析 24
4-1 實驗環境 24
4-2 實驗資料集 24
4-3 實驗評估指標 27
4-4 實驗設計與流程 27
4-4-1 實驗一 28
4-4-2 實驗二 33
4-4-3 實驗三 35
4-5 實驗結果討論 37
五、結論 39
5-1 研究結論與貢獻 39
5-2 未來研究方向 40
參考文獻 41

參考文獻

[1] G. Salton, A. Wang, and C. S. Yang, “A Vector Space Model for Automatic Indexing,” Communication of the ACM, Vol. 18, No. 11, pp. 613-620, 1975.
[2] F. Sebastiani, “Text categorization,” In Alessandro Zanasi (ed.), Text Mining and its Applications, WIT Press, pp. 109-129, 2005.
[3] D. Harman, “Relevance feedback revisited,” in Proceedings of the 15th annual international ACM SIGIR conference on research and development in Information Retrieval, pp. 1-10, June. 1992.
[4] E. D. Liddy, “Enhanced Text Retrieval Using Natural Language Processing,” Bulletin of the American Society for Information Science, pp. 14-16, 1998.
[5] I. Ruthven, and M. Lalmas, “A survey on the use of relevance feedback for information access systems,” Knowledge Engineering Review, Vol. 18, No. 2, pp. 95-145, 2003.
[6] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, Vol. 24, No. 5, pp. 513-523, 1988.
[7] L. Manevitz, and M. Yousef, “One-class document classification via neural networks,” Neuro computing, pp. 1466-1481, 2007.
[8] L. Denoyer, and P. Gallinari, “Bayesian Network Model For Semi-Structured Document Classification,” In Information Processing and Management, Vol. 40, Issue 5, pp. 807-827, 2004.
[9] B. C. M. Fung, K. Wang, and M. Ester, “Hierarchical Document Clustering Using Frequent Itemsets,” in Proceedings of SIAM international conference on Data Mining, 2003.
[10] B. Yang, J. T. Sun, T. Wang, and Z. Chen, “Effective multi-label active learning for text classification,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pp. 917-926, 2009.
[11] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor, “Kernel-Based Learning of Hierarchical Multilabel Classification Models,” The Journal of Machine Learning Research, pp. 1601-1626, 2006.
[12] C. Vens, J. Struyf, L. Schietgat, S. Dzeroski, and H. Blockeel, “Decision trees for hierarchical multi-label classification,” Machine Learning, Vol. 73, pp. 185-214, 2008.
[13] T. M. Cover, and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, Vol. 3, pp. 21-27, 1967.
[14] A. K. Jain, R. P. W. Duin, and J. Mao, “Statistical pattern recognition: a review,” IEEE Transitions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, pp. 4-37, 2000.
[15] L. A. Zadeh, “Fuzzy Sets,” Information and Control, Vol. 8, No. 3, pp. 338-353, 1965.
[16] C. Haruechaiyasak, M. L. Shyu, S. C. Chen, and X. Li, “Web document classification based on fuzzy association,” in Proceedings of the 26th IEEE international conference on Computer Software and Applications, pp. 487-492, 2002.
[17] T. Y. Wang, and H. M. Chiang, “Fuzzy support vector machine for multi-class text categorization,” Information Processing and Management, pp. 914-929, 2007.
[18] C. Cortes, and V. Vapnik, “Support-vector networks,” Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995.
[19] V. Vapnik, “Structure of statistical learning theory,” Computational Learning and Probabilistic Reasoning, John Wiely, 1996.
[20] T. Joachims, “Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms,” Kluwer academic Publishers, Vol. 29, No. 4, pp. 656-664, 2002.
[21] S. Tong, and E. Chang, “Support vector machine active learning for image retrieval,” in Proceedings of the 9th ACM international conference on Multimedia, pp. 107-118, September. 2000.
[22] A. Sung, and S. Mukkamala, “Identifying important features for intrusion detection using support vector machines and neural networks,” in Proceedings of the 2003 international Symposium on Applications and the Internet Technology, pp. 209-216, January. 2003.
[23] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Machine Learning, Vol. 46, pp. 131-159, 2002.
[24] A. J. Smola, “Learning with kernels,” Ph.D. thesis, Technische Universitat Berlin, 1998.
[25] F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, 2002.
[26] G. Salton, and M. J. McGill, “Introduction to Modern Information Retrieval,” McGraw-Hill, pp. 117-122, 1983.
[27] Y. Hijikata, “Implicit User Profiling for On Demand Relevance Feedback,” in Proceedings of ACM Intelligent User Interface Conference, pp. 198-205, January. 2004.
[28] M. Mitra, A. Singhal, and C. Buckley, “Improving Automatic Query Expansion,” in Proceedings of the 21st annual international ACM-SIGIR conference on research and development in Information Retrieval, pp. 206-214, 1998.
[29] J. J. Rocchio, “Document retrieval systems - optimization and evaluation,” Ph.D. thesis, Harvard Computational Laboratory, Harvard University, Cambridge, 1966.
[30] S. E. Robertson, and K. S. Jones, “Relevance Weighting of search terms,” Journal of the American Society for Information Science, Vol. 27, No. 3, pp. 129-146, 1976.
[31] M. Dillon, and J. Desper, “Automatic Relevance Feedback in Boolean Retrieval
System,” Journal of Documentation, Vol. 36, pp. 197-208, 1980.
[32] B. Baharudin, L. H. Lee, and K. Khan, “A Review of Machine Learning Algorithms for Text-Documents Classification,” Journal of Advances in Information Technology, Vol. 1, No. 1, pp. 4-20, Feb. 2010.
[33] A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic, “Searching the Web: The Public and Their Queries,” Journal of the American Society for Information Science and Technology, Vol. 52, pp. 226-234, 2001.
[34] J. Xu, and W. B. Croft, “Improving the effectiveness of information retrieval with local context analysis,” ACM Transactions on Information Systems, Vol. 18, No. 1, pp. 79-112, 2000.
[35] E. Efthimiadis, “Query expansion,” Annual Review of Information Systems and Technology (ARIST), M. E. Williams ed., Vol. 31, pp. 121-187, 1996.
[36] C. Buckley, G. Salton, J. Allan, and A. Singhal, “Automatic query expansion using SMART: TREC3,” in Proceedings of the Third Text Retrieval Conference (TREC-3) (Gaithersburg, Md.), NIST Special Publication 500-226, pp. 69-80, 1995.
[37] B. Croft, and D. J. Harper, “Using probabilistic models of document retrieval without relevance information,” J. Doc. 35, pp. 285-295, 1979.
[38] Z. Chen, and Y. Lu, “Using Text Classification Method in Relevance Feedback,” in Proceedings of the Second international conference on Intelligent information and database systems, pp. 441-449, 2010.
[39] J. L. Elsas, P. Donmez, J. Callan, and J. G. Carbonell, “Pairwise Document Classification for Relevance Feedback,” in Proceedings of the 2009 Text Retrieval Conference (TREC 2009), 2009.
[40] T. Onoda, H. Murata, and S. Yamada, “One Class Classification Methods Based Non-Relevance Feedback Document Retrieval,” in Proceeding IAT Workshops, pp. 393-396, 2006.
[41] B. Liu, X. Li, W. S. Lee, and P. S. Yu, “Text Classification by Labeling Words,” in Proceedings of the Nineteenth national conference on Artificial Intelligence, pp. 425-430, 2004.
[42] H. Drucker, B. Shahary, and D. Gibbon, “Relevance Feedback using Support Vector Machines,” in Proceedings of the 18th international conference on Machine Learning (ICML), pp. 122-129, June. 2001.
[43] H. Drucker, B. Shahraray, and D. C. Gibbon, “Support vector machines: relevance feedback and information retrieval,” Information Processing & Management, Vol. 38, No. 3, pp. 305-323, 2002.
[44] J. Bhogal, A. Macfarlane, and P. Smith, “A review of ontology based query expansion,” Information Processing & Management, Vol. 43, No. 4, pp. 866-886, 2007.
[45] Z. Bing, D. YaJun, L. HaiMing, and W. YuTing, “Query Expansion Based on Topics,” in Proceedings of Fifth international conference on Fuzzy Systems and Knowledge Discovery, pp. 610-614, 2008.
[46] W. Rudolf, “Concept lattices and conceptual knowledge systems,” Computers & Mathematics with Applications, Vol. 23, No. 6–9, pp. 493-515, 1992.
[47] N. N. Myat, and K. H. S. Hla, “A combined approach of formal concept analysis and text mining for concept based document clustering,” in Proceedings of the 2005 IEEE/WIC/ACM international conference on Web Intelligence, pp. 330-333, 2005.
[48] C. C. Su, “Document clustering based on vector space model with concepts as the dimension value,” National Central University, Taiwan, 2007.
[49] L. Cai, and T. Hofmann, “Text Categorization by Boosting Automatically Extracted Concepts,” in Proceedings of the 26th annual international ACM SIGIR conference on research and development in Information Retrieval, pp. 182-189, 2003.
[50] C. M. Rahman, F. A. Sohel, P. Naushad, and S. M. Kamruzzaman, “Text classification using the concept of association rule of data mining,” in Proceeding of the International Conference on Information Technology, pp. 23-26, 2003.
[51] P. G. Anick and S. Tipirneni, “The paraphrase search assistant: terminological feedback for iterative information seeking,” in Proceedings of the 22nd annual international ACM SIGIR conference on research and development in Information Retrieval, pp. 153-159, 1999.
[52] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, pp. 391-407, 1990.
[53] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd annual international ACM SIGIR conference on research and development in Information Retrieval, 1999.
[54] The Lemur Toolkit (2010). Lemur Project Home main page. [Online]. Available: http://www.lemurproject.org/.
[55] D. Lawrie, W. B. Croft, and A. Rosenberg, “Finding topic words for hierarchical summarization,” in Proceedings of 24th annual international ACM SIGIR conference on research and development in Information Retrieval, pp. 349-357, 2001.
[56] M. Ellen, Voorhees, and D. Harman, “Overview of the Sixth Text REtrieval Conference (TREC-6),” in Proceedings of Sixth Text REtrieval Conference (TREC-6), pp. 1-24, 1997.
[57] G. Salton, and C. Buckley, “Improving Retrieval Performance by Relevance Feedback,” Journal of the American society for information science, pp. 288-297, 1990.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2013-7-19

推文