運用凝聚模糊K-平均分群於潛在語意索引之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：28

、訪客IP：18.188.231.30

姓名

何旻修(Min Hsiu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

運用凝聚模糊K-平均分群於潛在語意索引之研究
(The Research of Using Agglomerative Fuzzy K-Means Clustering in Latent Semantic Indexing)

相關論文

★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例	★ 生物晶片之基因微陣列影像分析之研究
★ 台灣資訊家電產業IPv6技術地圖與發展策略之研究	★ 台灣第三代行動通訊產業IPv6技術地圖與發展策略之研究
★ 影響消費者使用電子書閱讀器採納意願之研究	★ 以資訊素養映對數位學習平台功能之研究
★ 台商群聚指標模式與資料分析之研究	★ 未來輪輔助軟體發展之需求擷取研究
★ 以工作流程圖展現未來研究方法配適於前瞻研究流程之研究	★ 以物件導向塑模未來研究方法配適於前瞻研究之系統架構
★ 應用TRIZ 探討核心因素建構電子商務新畫布	★ 企業策略資訊策略人力資源管理策略對組織績效的影響
★ 採用Color Petri Net方法偵測程式原始碼緩衝區溢位問題	★ 簡單且彈性化的軟體代理人通訊協定之探討與實作
★ 利用分析層級程序法探討台灣中草藥製造業之關鍵成功因素	★ 利用微陣列資料分析於基因調控網路之建構與預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

潛在語意索引一直未能普及的原因在於計算成本過高，對於擁有較大資料的數據集來說，要完成整個計算過程其成本過高，這些問題似乎在最近引起一些學者的關注，而提出一些以分群為基礎的方法，讓使用者查詢的關鍵字只需與相似群集進行比對來降低計算成本，但這些研究所使用的先分群方法只能根據使用者查詢的關鍵字與固定分群數目進行比較，以致於查詢效果有限。
本研究提出以凝聚模糊K-平均分群演算法對較大資料先進行分群，接著針對每一個群集各自進行奇異值分解與低維近似的分析，以找出每一個文件對映在低維度向量空間裡的座標，當使用者輸入關鍵字進行文件查詢時，透過模糊分群讓關鍵字可以動態的與所有相關的群集進行潛在語意索引，分析並找出與關鍵字相關的文件。根據實驗結果顯示，相較於傳統事先分群的方法，本研究的確能有效提升資訊檢索的查詢品質，且在動態選擇與關鍵字相關的群集進行比較時幾乎都可以選擇到最佳分群數目，在單一關鍵字查詢時F-measure值到達83%，查全率更是高達85%以上，兩個關鍵字查詢時F-measure值也有72%，證明藉由凝聚模糊K-平均分群演算法進行分群可以過濾掉大部分不相關群集的網頁資訊，降低潛在語意索引龐大的計算成本負擔。

摘要(英)

Due to high cost of computing latent semantic indexing has not been popular, and full computing of large datasets is still too expensive is concerned by some scholars recently, and some strategies is improved based on clustering, that allow users to query keywords with little effort on comparing with similar clusters and reduce computational cost. However, those studies using the strategies based on clustering can only be compared with a fixed number of comparisons, and the query results are limited.
The Agglomerative Fuzzy K-Means Clustering algorithm is proposed to carry out clusters on the large datasets. Each cluster is analyzed with execute singular value decomposition and low-rank approximations respectively to identify each document mapping in a low dimensional vector space coordinates. When keywords are input to query through the fuzzy clustering, latent semantic indexing with similar cluster can be carried out dynamically. Finally the documents with relevant keywords can be found. The experimental results show that the study does increase the quality of information retrieval comparing with traditional methods clustered in advance effectively, and in the dynamic cluster selection related with keywords almost chooses the best clustering number. F-measure value reached 83% in a single keyword query and recall rate is as high as 85%, the mean times F-measure value is also 72% in two key words query. Proved by Agglomerative Fuzzy K-Means Clustering algorithm for clustering, most of the web pages of information can be filtered with irrelevant cluster, and latent semantic index of the huge calculation costs can be reduced.

關鍵字(中)

★ 凝聚模糊K-平均分群演算法
★ 模糊分群
★ 向量空間模型
★ 潛在語意索引

關鍵字(英)

★ Latent Semantic Index
★ Agglomerative Fuzzy K-Means Clustering algorithm
★ Fuzzy Clustering
★ Vector Space Model

論文目次

目錄
中文摘要 i
Abstract I
誌謝 II
表目錄 V
圖目錄 VI
第一章緒論 1
1-1 研究動機 1
1-2 研究目的 2
1-3 研究限制 2
1-4 論文架構 2
第二章文獻探討 4
2-1文字資訊檢索 4
2-2向量空間模式 4
2-3 潛在語意 5
2-3-1 奇異值分解 5
2-3-2 低維近似 6
2-3-3 潛在語意索引 7
2-4 潛在語意索引的應用現況及面臨問題 8
2-5 模糊分群方法 9
2-5-1 模糊理論 9
2-5-2 模糊分群方法 10
2-5-3 凝聚模糊K-平均分群演算法 12
2-6 潛在語意與凝聚模糊K-平均分群之結合方法 15
第三章系統架構 16
3-1系統架構 16
3-2 文件前處理 18
3-3 關鍵詞彙擷取 19
3-4矩陣建立 21
3-5 模糊分群 22
3-6 隱含語意 23
3-7 查詢結果顯示 25
第四章系統實作與實驗結果 27
4-1 開發工具與實驗環境 27
4-2 實驗資料來源與凝聚模糊分群結果 27
4-3 評估方法 29
4-4 實驗結果與分析 30
4-4-1 實驗設計 30
4-4-2實驗結果 31
4-4-3 實驗分析 48
第五章結論與未來研究方向 50
5-1 結論與貢獻 50
5-2 未來研究方向 51
參考文獻 53
英文文獻 53
中文文獻 55
網站部分 55

參考文獻

[1] Anderberg, M. R, “ Cluster analysis for applications,” Academic press New York, 1973.
[2] Bass, D. and Behrens, C., “Distributed LSI: Scalable Concept-Based Information Retrieval with high semantic resolution,” 2003.
[3] Bezdek, J. C. , R. Ehrlich and W. Full, “FCM: The Fuzzy c-means Clustering algorithm,” Computers & Geosciences, Vol. 10, 1984, pp. 191-203.
[4] Chen, W. , “Round robin bag-of-words generation for text classification,” 碩士論文, 國立台灣科技大學資訊工程研究所, 2007.
[5] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, “Matrix Decompositions & Latent Semantic Indexing,” Introduction to Information Retrieval, 2008.
[6] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. A., “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, Vol. 41, No. 6., 1990.
[7] Dumais, S. T., T. A. Letsche, M. L. Littman and T. K. Landauer, “ Automatic cross-language retrieval using latent semantic indexing,” AAAI-97 Spring Symposium Series: Cross-Language Text and Speech Retrieval , 1997.
[8] Gao, J. and J. Zhang, “Sparsification Strategies in Latent Semantic Indexing ,” Proceedings of the 2003 Text Mining Workshop, 2003.
[9] Hofmann, T., “Probabilistic Latent Semantic Indexing,” Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States: ACM, 1999, pp. 50-57.
[10] Jain, A. K. and Dubes, R. C., “Algorithms for Clustering Data, Prentice-Hall,”Inc.,1988.
[11] L.A. Zadeh, “Fuzzy sets,” Information and Control, Vol. 8, 1965, pp. 338-353.
[12] Lin, C., “Clustering Multilingual Documents: A Latent Semantic Indexing Based Approach,” 碩士論文, 國立中山大學資訊管理研究所, 2005.
[13] Liu, T., Z. Chen, B. Zhang, W. Ma and G. Wu, “Improving Text Classification Using Local Latent Semantic Indexing,” Proceedings of the Tourth IEEE International Conference on Data Mining, 2004.
[14] M.J. Li, M.K. Ng, Y. Cheung, and J.Z. Huang, “Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters,” IEEE Trans. on Knowl. and Data Eng., vol. 20, 2008, pp. 1519-1534.
[15] Makhoul, J., F. Kubala, R. Schwartz and R. Weischedel, “Performance Measures for Information Extraction,” Proceedings of DARPA Broadcast News Workshop, Herndon, VA, 1999.
[16] Sasaki, Y. , “The Truth of F-measure,” Teaching, Tutorial Materials, 2007.
[17] Tang and J.-T., “Application of Principle Direction Divisive Partitioning and Singular Value Decomposition in Information Retrieval,” Masters Project Report, Department of Computer Science, University of Kentucky, Lexington, KY, 2003.
[18] Wang, M. and J. Nie, “A Latent Semantic Structure Model for Text Classification,” Workshop on Mathematical/Formal Methods in Information Retrieval, 26th ACM-SIGIR, 2003.
[19] Wei, C. P., C. C. Yang and C. M. Lin, “A Latent Semantic Indexing-Based Approach to Multilingual Document Clustering,” Decision Support Systems, Vol. 45, 2008, pp. 606-620.
[20] Yu, B., Z. Xu and C. Li. , “Latent Semantic Analysis for Text Categorization using Neural Network, “ Knowledge-Based Systems , Vol. 21 , 2008 ,pp. 900-904.
[21] Zha, H. and Z. Zhang. , “Matrices with Low-Rank-Plus-Shift Structure: Partial SVD and Latent SemanticIndexing,” SIAM Journal on Matrix Analysis and Applications, Vol. 21 , 2000, pp. 522-536.
[22] Z. Zhang and H. Zha, “Structure and Perturbation Analysis of Truncated SVDs for Column-Partitioned Matrices,” SIAM Journal on Matrix Analysis and Applications, Vol. 22, 2001, pp. 1245-1262.
[23] 汪若文, “運用潛在語意索引的自動化文件分類,” 碩士論文, 國立交通大學管理學院研究所, 2004.
[24] Google新聞. http://news.google.com.tw/news?pz=1&hl=zh-TW&ned=tw ed.
[25] 維基百科, http://zh.wikipedia.org/zh-tw/%E6%BD%9C%E5%9C%A8%E8 %AF%AD%E4%B9%89%E5%AD%A6.
[26] 中文斷詞系統 ,http://ckipsvr.iis.sinica.edu.tw/

指導教授

薛義誠(Yih-Chearng Shiue)

審核日期

2010-7-12

推文