DC 欄位 |
值 |
語言 |
DC.contributor | 資訊管理學系 | zh_TW |
DC.creator | 何旻修 | zh_TW |
DC.creator | Min Hsiu | en_US |
dc.date.accessioned | 2010-7-12T07:39:07Z | |
dc.date.available | 2010-7-12T07:39:07Z | |
dc.date.issued | 2010 | |
dc.identifier.uri | http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=974203051 | |
dc.contributor.department | 資訊管理學系 | zh_TW |
DC.description | 國立中央大學 | zh_TW |
DC.description | National Central University | en_US |
dc.description.abstract | 潛在語意索引一直未能普及的原因在於計算成本過高,對於擁有較大資料的數據集來說,要完成整個計算過程其成本過高,這些問題似乎在最近引起一些學者的關注,而提出一些以分群為基礎的方法,讓使用者查詢的關鍵字只需與相似群集進行比對來降低計算成本,但這些研究所使用的先分群方法只能根據使用者查詢的關鍵字與固定分群數目進行比較,以致於查詢效果有限。
本研究提出以凝聚模糊K-平均分群演算法對較大資料先進行分群,接著針對每一個群集各自進行奇異值分解與低維近似的分析,以找出每一個文件對映在低維度向量空間裡的座標,當使用者輸入關鍵字進行文件查詢時,透過模糊分群讓關鍵字可以動態的與所有相關的群集進行潛在語意索引,分析並找出與關鍵字相關的文件。根據實驗結果顯示,相較於傳統事先分群的方法,本研究的確能有效提升資訊檢索的查詢品質,且在動態選擇與關鍵字相關的群集進行比較時幾乎都可以選擇到最佳分群數目,在單一關鍵字查詢時F-measure值到達83%,查全率更是高達85%以上,兩個關鍵字查詢時F-measure值也有72%,證明藉由凝聚模糊K-平均分群演算法進行分群可以過濾掉大部分不相關群集的網頁資訊,降低潛在語意索引龐大的計算成本負擔。
| zh_TW |
dc.description.abstract | Due to high cost of computing latent semantic indexing has not been popular, and full computing of large datasets is still too expensive is concerned by some scholars recently, and some strategies is improved based on clustering, that allow users to query keywords with little effort on comparing with similar clusters and reduce computational cost. However, those studies using the strategies based on clustering can only be compared with a fixed number of comparisons, and the query results are limited.
The Agglomerative Fuzzy K-Means Clustering algorithm is proposed to carry out clusters on the large datasets. Each cluster is analyzed with execute singular value decomposition and low-rank approximations respectively to identify each document mapping in a low dimensional vector space coordinates. When keywords are input to query through the fuzzy clustering, latent semantic indexing with similar cluster can be carried out dynamically. Finally the documents with relevant keywords can be found. The experimental results show that the study does increase the quality of information retrieval comparing with traditional methods clustered in advance effectively, and in the dynamic cluster selection related with keywords almost chooses the best clustering number. F-measure value reached 83% in a single keyword query and recall rate is as high as 85%, the mean times F-measure value is also 72% in two key words query. Proved by Agglomerative Fuzzy K-Means Clustering algorithm for clustering, most of the web pages of information can be filtered with irrelevant cluster, and latent semantic index of the huge calculation costs can be reduced.
| en_US |
DC.subject | 凝聚模糊K-平均分群演算法 | zh_TW |
DC.subject | 模糊分群 | zh_TW |
DC.subject | 向量空間模型 | zh_TW |
DC.subject | 潛在語意索引 | zh_TW |
DC.subject | Latent Semantic Index | en_US |
DC.subject | Agglomerative Fuzzy K-Means Clustering algorithm | en_US |
DC.subject | Fuzzy Clustering | en_US |
DC.subject | Vector Space Model | en_US |
DC.title | 運用凝聚模糊K-平均分群於潛在語意索引之研究 | zh_TW |
dc.language.iso | zh-TW | zh-TW |
DC.title | The Research of Using Agglomerative Fuzzy K-Means Clustering in Latent Semantic Indexing | en_US |
DC.type | 博碩士論文 | zh_TW |
DC.type | thesis | en_US |
DC.publisher | National Central University | en_US |