論文名稱 促進個人化網頁摘要搜尋的階層式分群系統
(A hierarchical clustering system to enhance personalized web-snippet search)
摘要(中) 本研究提出一個人化之階層式網頁摘文件分群系統,此系統架構在搜尋引擎之上,依據取使用者之查詢字串,匯集相關的網頁摘要文件,形成瀏覽階層以說明文件內容,各網頁摘要文件會被分群到適切群集中,每一個群集的概念由其標籤所描述。最後,依據使用者剖析檔中的使用者偏好排序所有文件以及標籤,產出個人化的瀏覽階層。
1. 提供一階層式文件分群演算法,此演算法可建構無冗述之瀏覽階層。
2. 應用以使用者剖析檔為基礎的分類器於網頁摘要文件分群系統中,使網頁摘要文件分群系統能夠建置個人化的瀏覽階層。
摘要(英) This paper provides a hierarchical web-snippet clustering system with the personalized ability on search engines. According to the user’s query string, the system collects snippets and formulates the corresponding browsing hierarchy to describe contents of snippets. In the browsing hierarchy, every snippet is clustered into fit clusters and the concept of every cluster is described by its label. At the last phase, the system sorts all snippets and labels according to the user’s preferences in the user profile, and outputs the personalized browsing hierarchy.
The system applies lexical affinity to extract labels. By using statistical measures the system can extract related but interrupted words as labels. Thus, more flexible forms of labels can be used and as the experiment shows, the system can count label frequency more precisely.
In the aspect of building the browsing hierarchy, our research provides an algorithm that extends the DisCover algorithm and can produce a browsing hierarchy without redundancy according to label dispersion. The algorithm selects son labels in greedy way and evaluates every candidate son label against four factors - document coverage、sibling node distinctiveness、redundancy、and compactness. As the experiment shows, the algorithm avoids producing redundancy.
In the aspect of constructing the user profile, the system uses a reference ontology and documents which describe user’s preference to build the directory-like user profile. At the online phase, the system sorts snippets and labels according to their importance. The importance of one snippet is computed from weights and similarities of related directories in the user profile. The importance of one label is weighted according to the label frequency, the label depth, and related snippets on the node. As the experiments shows, the system can assist the user in searching information by sorting snippets.
The effectiveness of our system includes discovering thematic relationships among snippets and assisting in searching wanted information.
The contribution of our research is twofold:
1. To provide a hierarchical document clustering algorithm which can be used to build the browsing hierarchy without redundancy.
2. To apply a profile-based classifier to the web-snippet clustering system to produce the personalized browsing hierarchy.
關鍵字(中) ★ 資訊檢索
★ 網頁摘要文件分群系統
★ 個人化
關鍵字(英) ★ web-snippet clustering system
★ Information retrieval
★ personalization
論文目次 第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的與範圍 2
1.3 研究限制 3
1.4 研究流程 3
1.5 論文架構 4
第二章 文獻探討 6
2.1 搜尋引擎與網頁摘要文件分群系統 6
2.2 網頁摘要文件分群系統 9
2.2.1 字彙及平列式分群 10
2.2.2 句子及平列式分群 11
2.2.3 字彙及平列式分群 12
2.2.4 句子及階層式分群 17
2.2.5 網頁摘要文件分群系統設計總覽 21
2.3 搭配詞 22
2.4 以使用者剖析檔為基礎之個人化系統 27
第三章 系統設計 29
3.1 系統構想 29
3.2 系統架構 30
3.2.1 文件前處理階段 33
3.2.2 瀏覽階層建構階段 37
3.2.3 個人化瀏覽階層建構階段 41
第四章 實驗結果 47
4.1 實驗設計 47
4.2 實驗結果 49
第五章 結論 58
5.1 研究結論及貢獻 58
5.2 未來研究方向 60
參考文獻 61
指導教授 周世傑(Shih-Chieh Chou) 審核日期 2006-7-7
