以概念為維度之向量空間模型為基礎以進行文件分群之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：51

、訪客IP：18.119.102.149

姓名

蘇千傑(Chien-chieh Su) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以概念為維度之向量空間模型為基礎以進行文件分群之研究
(Document clustering based on vector space model with concepts as the dimension value)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在資訊檢索相關研究中，文件分群是用來令使用者能夠更加快速找到自己所需資訊的技術，利用分群的結構，我們可以有效的管理各種知識與資訊，它是一門知識管理的工具。
文件分群通常需要進行文件相似度比對，傳統上利用文章中的字彙當作向量空間模型的維度，此種方式，有一項弱點，即當兩篇文章在語意上相同，但用不同的字彙呈現時，會無法準確判斷文章間相似度而使文件分群困難。本研究結合了概念擷取與向量空間模式(Vector space model)兩種技術來協助文件分群，希望能夠以文章中所涵蓋的概念來代表文章，然後產生一個以概念為維度的向量空間模型，已進行文件相似度比對，希望能提高文件相似度比對的效能，進而使分群的效果更加完善。
我們進行了實驗來觀察使用概念為維度的向量空間模型，是否比傳統使用字彙為維度的向量空間模型，對於文件分群，具有更佳的效能，結果顯示使用概念為維度的向量空間模型，確實能夠幫助我們對文件作更準確的分群。

摘要(英)

In Information Retrieval, document clustering is a technology that can enhance the efficiency in the retrieving of needed information. With document clustering, one can efficiently management all kinds of knowledge and information. Document clustering is a tool for knowledge management.
Traditionally, document clustering is based on document similarity comparison where the document is represented by the vector space model with term as the dimension value. In this approach, the documents with the same semantic meaning might be classified as unsimilar because they are described with different words.In this research, we have integrated the technology of concept extraction with vector space model for document similarity comparison. We extract concepts from the documents first, then create a vector space model with the extracted concepts as the dimension value for the document. Documents similarity comparison is based on the concept-dimensioned vector space model. We wish that the concept based vector space model could enhance the document clustering efficiency.
We have experimented with the document clustering effect for the concept based vector space modle. The results show that the concept based vector space model can perform better than term based vector space model.

關鍵字(中)

★ 知識管理
★ 概念擷取
★ 向量空間模型
★ 文件分群
★ 資訊檢索

關鍵字(英)

★ knowledge management
★ concept extraction
★ vector space model
★ document clustering
★ information retrieval

論文目次

章節目錄 I
圖目錄 III
表目錄 IV
第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 研究範圍與限制 2
1.4 論文架構 2
第二章文獻探討 3
2.1 文件分群相關研究 3
2.2 概念(Concept)相關研究 7
第三章系統設計 9
3.1 概念擷取(Concepts Extraction) 10
3.2 以概念來表示文章(Concepts to represent a document) 13
3.3 透過概念計算文章相似度(Concepts applied for Documents similarity comparison) 16
3.4 分群(Cluster) 17
第四章實驗分析 19
4.1 資料集 19
4.2 評估方式 20
4.3 實驗設計 22
4.4 實驗一：針對資料複雜度較小的資料集 22
4.5 實驗二：針對資料複雜度較大的資料集 26
4.6 討論 29
第五章結論 31
參考文獻 33

參考文獻

[1] P. G. Anick and S. Tipirneni. The paraphrase search assistant: terminological feedback for iterative information seeking. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 153-159, 1999.
[2] R. Baeza-Yates and B. Robeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman, pages 19-69, 1999.
[3] Lijuan Cai and Thomas Hofmann. Text Categorization by Boosting Automatically Extracted Concepts. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 182-189, 2003.
[4] Hung Chim. A new suffix tree similarity measure for document clustering. In Proceedings of the 16th international conference on World Wide Web, pages 121-130, 2007.
[5] N. Fuhr. Probabilistic model in information retrieval. The Computer Journal, 35(3):243-255, 1992.
[6] Sreenivas Gollapudi and Rina Panigrahy. Exploiting asymmetry in hierarchical topic extraction. In Proceedings of the 15th ACM international conference on Information and knowledge management, pages 475-482, 2006.
[7] A. Griffith, H. C. Luckhurst, and P. Willet. Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Science, 37(1):3-11, 1986.
[8] Dawn Lawrie, W. Bruce Croft, and Arnold Rosenberg. Finding topic words for hierarchical summarization. In Proceedings of 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 349-357, 2001.
[9] D. D. Lewis. Reuters-21578 Text Categorization Test Collection Distribution, Distribution 1.0. AT&T Labs-Research, 1997.
[10] Nyeint Nyeint Myat and Khin Haymar Saw Hla. A combined approach of formal concept analysis and text mining for concept based document clustering. In Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence, pages 330-333, 2005.
[11] Chowdhury Mofizur Rahman, Ferdous Ahmed Sohel, Parvez Naushad, and S. M. Kamruzzaman. Text classification using the concept of association rule of data mining. In Proceeding of the International Conference on Information Technology, pages 23-26, 2003.
[12] E. Rasmussen. Clustering algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval, pages 419-442. Prentice Hall, Eaglewood Cliffs, N.J., 1992.
[13] S. E. Roberston and K. Sparck Jones. Relevance Weighting of search terms. Journal of the American Society for Information Sciences, 27(3):129-146, 1976.
[14] W. M. Shaw Jr., R. Burgin, and P. Howell. Performance standards and evaluations in IR text collections: Cluster-based retrieval models. Information processing & management, 33(1):15-36, 1997.
[15] G. Salton and M. E. Lesk. Computer evaluation of indexing and text processing. Journal of the ACM, 15(1):8-36, January 1968.
[16] Mark Sanderson and Bruce Croft. Deriving concept hierarchies from text. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 206-213, 1999.
[17] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, pages 23-43, 1979.

指導教授

周世傑(Shih-chieh Chou)

審核日期

2007-7-24

推文