博碩士論文 974203051 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator何旻修zh_TW
DC.creatorMin Hsiuen_US
dc.date.accessioned2010-7-12T07:39:07Z
dc.date.available2010-7-12T07:39:07Z
dc.date.issued2010
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=974203051
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract潛在語意索引一直未能普及的原因在於計算成本過高,對於擁有較大資料的數據集來說,要完成整個計算過程其成本過高,這些問題似乎在最近引起一些學者的關注,而提出一些以分群為基礎的方法,讓使用者查詢的關鍵字只需與相似群集進行比對來降低計算成本,但這些研究所使用的先分群方法只能根據使用者查詢的關鍵字與固定分群數目進行比較,以致於查詢效果有限。 本研究提出以凝聚模糊K-平均分群演算法對較大資料先進行分群,接著針對每一個群集各自進行奇異值分解與低維近似的分析,以找出每一個文件對映在低維度向量空間裡的座標,當使用者輸入關鍵字進行文件查詢時,透過模糊分群讓關鍵字可以動態的與所有相關的群集進行潛在語意索引,分析並找出與關鍵字相關的文件。根據實驗結果顯示,相較於傳統事先分群的方法,本研究的確能有效提升資訊檢索的查詢品質,且在動態選擇與關鍵字相關的群集進行比較時幾乎都可以選擇到最佳分群數目,在單一關鍵字查詢時F-measure值到達83%,查全率更是高達85%以上,兩個關鍵字查詢時F-measure值也有72%,證明藉由凝聚模糊K-平均分群演算法進行分群可以過濾掉大部分不相關群集的網頁資訊,降低潛在語意索引龐大的計算成本負擔。 zh_TW
dc.description.abstractDue to high cost of computing latent semantic indexing has not been popular, and full computing of large datasets is still too expensive is concerned by some scholars recently, and some strategies is improved based on clustering, that allow users to query keywords with little effort on comparing with similar clusters and reduce computational cost. However, those studies using the strategies based on clustering can only be compared with a fixed number of comparisons, and the query results are limited. The Agglomerative Fuzzy K-Means Clustering algorithm is proposed to carry out clusters on the large datasets. Each cluster is analyzed with execute singular value decomposition and low-rank approximations respectively to identify each document mapping in a low dimensional vector space coordinates. When keywords are input to query through the fuzzy clustering, latent semantic indexing with similar cluster can be carried out dynamically. Finally the documents with relevant keywords can be found. The experimental results show that the study does increase the quality of information retrieval comparing with traditional methods clustered in advance effectively, and in the dynamic cluster selection related with keywords almost chooses the best clustering number. F-measure value reached 83% in a single keyword query and recall rate is as high as 85%, the mean times F-measure value is also 72% in two key words query. Proved by Agglomerative Fuzzy K-Means Clustering algorithm for clustering, most of the web pages of information can be filtered with irrelevant cluster, and latent semantic index of the huge calculation costs can be reduced. en_US
DC.subject凝聚模糊K-平均分群演算法zh_TW
DC.subject模糊分群zh_TW
DC.subject向量空間模型zh_TW
DC.subject潛在語意索引zh_TW
DC.subjectLatent Semantic Indexen_US
DC.subjectAgglomerative Fuzzy K-Means Clustering algorithmen_US
DC.subjectFuzzy Clusteringen_US
DC.subjectVector Space Modelen_US
DC.title運用凝聚模糊K-平均分群於潛在語意索引之研究zh_TW
dc.language.isozh-TWzh-TW
DC.titleThe Research of Using Agglomerative Fuzzy K-Means Clustering in Latent Semantic Indexingen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明