促進個人化網頁摘要搜尋的階層式分群系統; A hierarchical clustering system to enhance personalized web-snippet search

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/13276

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/13276

题名:	促進個人化網頁摘要搜尋的階層式分群系統;A hierarchical clustering system to enhance personalized web-snippet search
作者:	孫建成;Chjen-Cheng Sun
贡献者:	資訊管理研究所
关键词:	資訊檢索;網頁摘要文件分群系統;個人化;web-snippet clustering system;Information retrieval;personalization
日期:	2006-06-20
上传时间:	2009-09-22 15:27:52 (UTC+8)
出版者:	國立中央大學圖書館
摘要:	本研究提出一個人化之階層式網頁摘文件分群系統，此系統架構在搜尋引擎之上，依據取使用者之查詢字串，匯集相關的網頁摘要文件，形成瀏覽階層以說明文件內容，各網頁摘要文件會被分群到適切群集中，每一個群集的概念由其標籤所描述。最後，依據使用者剖析檔中的使用者偏好排序所有文件以及標籤，產出個人化的瀏覽階層。本研究系統應用語彙關聯擷取標籤，其優點在於可取得位置非連續但相互關聯的文字作為標籤，因此可以獲取形成較為彈性的標籤，並且，如同實驗結果所示，能夠較精確地評估標籤的出現次數。在瀏覽階層建置方面，本研究擴充DisCover演算法，提供一個以標籤分佈為基礎的文件分群演算法，可建構出無冗述的瀏覽階層。在挑選候選子標籤時，依據文件涵蓋、兄弟節點區別性、冗述、以及緊密等四項要素評估各候選子標籤。如實驗結果所示，此演算法可避免產生冗述。在使用者剖析檔建置上，利用一參考本體與使用者興趣資料，建構目錄式的使用者剖析檔。在線上處理階段，依據每一網頁摘要文件及標籤之重要性排序，產出個人化的瀏覽階層。網頁摘要文件之重要性，依相關聯的剖析檔目錄相似度與目錄權重值計算而得，而標籤重要性之計算，則考量每一標籤所處節點上相關聯文件之重要性、每一標籤出現的頻率、及標籤位於節點的深度。如同實驗結果所示，所使用的文件排序機制，確實可協助使用者搜尋資訊。本研究系統之效益包含說明網頁摘要文件內容之關聯性以及協助使用者搜尋所需資訊兩個方面。本研究之貢獻如下所述： 1. 提供一階層式文件分群演算法，此演算法可建構無冗述之瀏覽階層。 2. 應用以使用者剖析檔為基礎的分類器於網頁摘要文件分群系統中，使網頁摘要文件分群系統能夠建置個人化的瀏覽階層。 This paper provides a hierarchical web-snippet clustering system with the personalized ability on search engines. According to the user’s query string, the system collects snippets and formulates the corresponding browsing hierarchy to describe contents of snippets. In the browsing hierarchy, every snippet is clustered into fit clusters and the concept of every cluster is described by its label. At the last phase, the system sorts all snippets and labels according to the user’s preferences in the user profile, and outputs the personalized browsing hierarchy. The system applies lexical affinity to extract labels. By using statistical measures the system can extract related but interrupted words as labels. Thus, more flexible forms of labels can be used and as the experiment shows, the system can count label frequency more precisely. In the aspect of building the browsing hierarchy, our research provides an algorithm that extends the DisCover algorithm and can produce a browsing hierarchy without redundancy according to label dispersion. The algorithm selects son labels in greedy way and evaluates every candidate son label against four factors - document coverage、sibling node distinctiveness、redundancy、and compactness. As the experiment shows, the algorithm avoids producing redundancy. In the aspect of constructing the user profile, the system uses a reference ontology and documents which describe user’s preference to build the directory-like user profile. At the online phase, the system sorts snippets and labels according to their importance. The importance of one snippet is computed from weights and similarities of related directories in the user profile. The importance of one label is weighted according to the label frequency, the label depth, and related snippets on the node. As the experiments shows, the system can assist the user in searching information by sorting snippets. The effectiveness of our system includes discovering thematic relationships among snippets and assisting in searching wanted information. The contribution of our research is twofold: 1. To provide a hierarchical document clustering algorithm which can be used to build the browsing hierarchy without redundancy. 2. To apply a profile-based classifier to the web-snippet clustering system to produce the personalized browsing hierarchy.
显示于类别:	[資訊管理研究所] 博碩士論文

文件中的档案:

档案	大小	格式	浏览次数

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....