運用凝聚模糊K-平均分群於潛在語意索引之研究

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	何旻修	zh_TW
DC.creator	Min Hsiu	en_US
dc.date.accessioned	2010-7-12T07:39:07Z
dc.date.available	2010-7-12T07:39:07Z
dc.date.issued	2010
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=974203051
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	潛在語意索引一直未能普及的原因在於計算成本過高，對於擁有較大資料的數據集來說，要完成整個計算過程其成本過高，這些問題似乎在最近引起一些學者的關注，而提出一些以分群為基礎的方法，讓使用者查詢的關鍵字只需與相似群集進行比對來降低計算成本，但這些研究所使用的先分群方法只能根據使用者查詢的關鍵字與固定分群數目進行比較，以致於查詢效果有限。本研究提出以凝聚模糊K-平均分群演算法對較大資料先進行分群，接著針對每一個群集各自進行奇異值分解與低維近似的分析，以找出每一個文件對映在低維度向量空間裡的座標，當使用者輸入關鍵字進行文件查詢時，透過模糊分群讓關鍵字可以動態的與所有相關的群集進行潛在語意索引，分析並找出與關鍵字相關的文件。根據實驗結果顯示，相較於傳統事先分群的方法，本研究的確能有效提升資訊檢索的查詢品質，且在動態選擇與關鍵字相關的群集進行比較時幾乎都可以選擇到最佳分群數目，在單一關鍵字查詢時F-measure值到達83%，查全率更是高達85%以上，兩個關鍵字查詢時F-measure值也有72%，證明藉由凝聚模糊K-平均分群演算法進行分群可以過濾掉大部分不相關群集的網頁資訊，降低潛在語意索引龐大的計算成本負擔。	zh_TW
dc.description.abstract	Due to high cost of computing latent semantic indexing has not been popular, and full computing of large datasets is still too expensive is concerned by some scholars recently, and some strategies is improved based on clustering, that allow users to query keywords with little effort on comparing with similar clusters and reduce computational cost. However, those studies using the strategies based on clustering can only be compared with a fixed number of comparisons, and the query results are limited. The Agglomerative Fuzzy K-Means Clustering algorithm is proposed to carry out clusters on the large datasets. Each cluster is analyzed with execute singular value decomposition and low-rank approximations respectively to identify each document mapping in a low dimensional vector space coordinates. When keywords are input to query through the fuzzy clustering, latent semantic indexing with similar cluster can be carried out dynamically. Finally the documents with relevant keywords can be found. The experimental results show that the study does increase the quality of information retrieval comparing with traditional methods clustered in advance effectively, and in the dynamic cluster selection related with keywords almost chooses the best clustering number. F-measure value reached 83% in a single keyword query and recall rate is as high as 85%, the mean times F-measure value is also 72% in two key words query. Proved by Agglomerative Fuzzy K-Means Clustering algorithm for clustering, most of the web pages of information can be filtered with irrelevant cluster, and latent semantic index of the huge calculation costs can be reduced.	en_US
DC.subject	凝聚模糊K-平均分群演算法	zh_TW
DC.subject	模糊分群	zh_TW
DC.subject	向量空間模型	zh_TW
DC.subject	潛在語意索引	zh_TW
DC.subject	Latent Semantic Index	en_US
DC.subject	Agglomerative Fuzzy K-Means Clustering algorithm	en_US
DC.subject	Fuzzy Clustering	en_US
DC.subject	Vector Space Model	en_US
DC.title	運用凝聚模糊K-平均分群於潛在語意索引之研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	The Research of Using Agglomerative Fuzzy K-Means Clustering in Latent Semantic Indexing	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 974203051 完整後設資料紀錄