促進個人化網頁摘要搜尋的階層式分群系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：37

、訪客IP：18.116.51.65

姓名

孫建成(Chjen-Cheng Sun) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

促進個人化網頁摘要搜尋的階層式分群系統
(A hierarchical clustering system to enhance personalized web-snippet search)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本研究提出一個人化之階層式網頁摘文件分群系統，此系統架構在搜尋引擎之上，依據取使用者之查詢字串，匯集相關的網頁摘要文件，形成瀏覽階層以說明文件內容，各網頁摘要文件會被分群到適切群集中，每一個群集的概念由其標籤所描述。最後，依據使用者剖析檔中的使用者偏好排序所有文件以及標籤，產出個人化的瀏覽階層。
本研究系統應用語彙關聯擷取標籤，其優點在於可取得位置非連續但相互關聯的文字作為標籤，因此可以獲取形成較為彈性的標籤，並且，如同實驗結果所示，能夠較精確地評估標籤的出現次數。
在瀏覽階層建置方面，本研究擴充DisCover演算法，提供一個以標籤分佈為基礎的文件分群演算法，可建構出無冗述的瀏覽階層。在挑選候選子標籤時，依據文件涵蓋、兄弟節點區別性、冗述、以及緊密等四項要素評估各候選子標籤。如實驗結果所示，此演算法可避免產生冗述。
在使用者剖析檔建置上，利用一參考本體與使用者興趣資料，建構目錄式的使用者剖析檔。在線上處理階段，依據每一網頁摘要文件及標籤之重要性排序，產出個人化的瀏覽階層。網頁摘要文件之重要性，依相關聯的剖析檔目錄相似度與目錄權重值計算而得，而標籤重要性之計算，則考量每一標籤所處節點上相關聯文件之重要性、每一標籤出現的頻率、及標籤位於節點的深度。如同實驗結果所示，所使用的文件排序機制，確實可協助使用者搜尋資訊。
本研究系統之效益包含說明網頁摘要文件內容之關聯性以及協助使用者搜尋所需資訊兩個方面。
本研究之貢獻如下所述：
1. 提供一階層式文件分群演算法，此演算法可建構無冗述之瀏覽階層。
2. 應用以使用者剖析檔為基礎的分類器於網頁摘要文件分群系統中，使網頁摘要文件分群系統能夠建置個人化的瀏覽階層。

摘要(英)

This paper provides a hierarchical web-snippet clustering system with the personalized ability on search engines. According to the user’s query string, the system collects snippets and formulates the corresponding browsing hierarchy to describe contents of snippets. In the browsing hierarchy, every snippet is clustered into fit clusters and the concept of every cluster is described by its label. At the last phase, the system sorts all snippets and labels according to the user’s preferences in the user profile, and outputs the personalized browsing hierarchy.
The system applies lexical affinity to extract labels. By using statistical measures the system can extract related but interrupted words as labels. Thus, more flexible forms of labels can be used and as the experiment shows, the system can count label frequency more precisely.
In the aspect of building the browsing hierarchy, our research provides an algorithm that extends the DisCover algorithm and can produce a browsing hierarchy without redundancy according to label dispersion. The algorithm selects son labels in greedy way and evaluates every candidate son label against four factors - document coverage、sibling node distinctiveness、redundancy、and compactness. As the experiment shows, the algorithm avoids producing redundancy.
In the aspect of constructing the user profile, the system uses a reference ontology and documents which describe user’s preference to build the directory-like user profile. At the online phase, the system sorts snippets and labels according to their importance. The importance of one snippet is computed from weights and similarities of related directories in the user profile. The importance of one label is weighted according to the label frequency, the label depth, and related snippets on the node. As the experiments shows, the system can assist the user in searching information by sorting snippets.
The effectiveness of our system includes discovering thematic relationships among snippets and assisting in searching wanted information.
The contribution of our research is twofold:
1. To provide a hierarchical document clustering algorithm which can be used to build the browsing hierarchy without redundancy.
2. To apply a profile-based classifier to the web-snippet clustering system to produce the personalized browsing hierarchy.

關鍵字(中)

★ 資訊檢索
★ 網頁摘要文件分群系統
★ 個人化

關鍵字(英)

★ web-snippet clustering system
★ Information retrieval
★ personalization

論文目次

第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的與範圍 2
1.3 研究限制 3
1.4 研究流程 3
1.5 論文架構 4
第二章文獻探討 6
2.1 搜尋引擎與網頁摘要文件分群系統 6
2.2 網頁摘要文件分群系統 9
2.2.1 字彙及平列式分群 10
2.2.2 句子及平列式分群 11
2.2.3 字彙及平列式分群 12
2.2.4 句子及階層式分群 17
2.2.5 網頁摘要文件分群系統設計總覽 21
2.3 搭配詞 22
2.4 以使用者剖析檔為基礎之個人化系統 27
第三章系統設計 29
3.1 系統構想 29
3.2 系統架構 30
3.2.1 文件前處理階段 33
3.2.2 瀏覽階層建構階段 37
3.2.3 個人化瀏覽階層建構階段 41
第四章實驗結果 47
4.1 實驗設計 47
4.2 實驗結果 49
第五章結論 58
5.1 研究結論及貢獻 58
5.2 未來研究方向 60
參考文獻 61

參考文獻

[1] P. Anick and S. Tipirneni, “The paraphrase search assistant: Terminological feedback for iterative information seeking,” In Proceedings on the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159, 1999.
[2] R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell, “WebWatcher: A Learning Apprentice for the World Wide Web,” In Proceedings of the AAAI Spring Symposium on Information Gathering, pp. 6-12, 1995.
[3] P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the Web - Probabilistic Methods and Algorithms, Wiley & Sons, 2003.
[4] M. Benson, “The structure of the collocational dictionary,” In International Journal of Lexicography, 2(1), pp. 1-14, 1989.
[5] E. Brill, “A simple rule-based part of speech tagger, ”In Proceedings of the 3th Conference of Applied Natural Language Processing, pp. 152-155, 1992.
[6] P. Chan, “Constructing Web User Profiles: A Non-Invasive Learning Approach,” In Web Usage Analysis and User Profiling, LNAI 1836, Springer-Verlag, pp. 39-55, 2003.
[7] L. Chen and K. Sycara, “A Personal Agent for Browsing and Searching,” In Proceedings of the 2nd International Conference on Autonomous Agents, pp. 132-139, 1998.
[8] J. Cho and H. Garcia-Molina, “The Evolution of the Web and Implications for an Incremental Crawler,“ In Proceedings of the 26thInternational Conference on Very Large Databases, pp. 200-209 , 2000.
[9] Y. Choueka, “Looking for needles in a haystack or locating interesting collocations expressions in large textual databases,” In Computational Linguistics, 20(4), pp. 635-648, 1988.
[10] P. Ferragina and A. Gulli, "A personalized search engine based on web-snippet hierarchical clustering," Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 801-810, 2005.
[11] K. T. Frantzi and S. Ananiadou, “Extracting nested collocations,” In Proceedings of the 16th Conference on Computational linguistics, pp. 41-46, 1996.
[12] K. Frantzi, S. Ananiadou, and H. Mima, “Automatic recognition of multiword terms,” In International Journal of Digital Libraries, 3(2), pp. 117-132, 2000.
[13] B. Fung, K. Wang, and M. Ester, “Large hierarchical document clustering using frequent itemsets,” In Proceedings of the 3th SIAM International Conference on Data Mining, pp. 59-70, 2003.
[14] S. Gauch , J. Chafee, and A. Pretschner, ”Ontology-based personalized search and browsing,” In Web Intelligence and Agent System, 1(3-4), pp. 219-234, 2003.
[15] F. Giannotti, M. Nanni, and D. Pedreschi, “Webcat: Automatic categorization of web search results,” In Proceedings of the 11th Italian Symposium on Advanced Database Systems, pp. 507-518, 2003.
[16] M. A. Hearst and J. O. Pedersen, ”Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results,” In Proceedings of 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76-84, 1996.
[17] S. Ikehara, S. Shirai, and T. Kawaoka, “Automatic Extraction of Collocations from Very Large Japanese Corpora using N-gram Statistics,” In Transactions of Information Processing Society of Japan, 1995(11), pp. 2584-2596, 1995.
[18] Z. Jiang, A. Joshi, R. Krishnapuram, and Li. Yi, “Retriever: Improving web search engine results using clustering,” In Managing Business with Electronic Commerce 02, 2002.
[19] R. Jin, C. Falusos, and A. G. Hauptmann, “Meta-scoring: Automatically evaluating term weighting schemes in ir without precision-recall,” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 83-89, 2001.
[20] K. Kita, Y. Kato, T. Omoto, and Y. Yano, “A comparative study of automatic extraction of collocations from corpora: Mutual information vs. cost criteria,” In Journal of Natural Language Processing, 1(1), pp. 21-33, 1994.
[21] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl, “GroupLens: Applying Collaborative Filtering To Usenet News,” In Communications of the ACM, 40(3), pp. 77-87, 1997.
[22] K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram, “A hierarchical monothetic document clustering algorithm for summarization and browsing search results,” In Proceedings of the 13th International Conference on World Wide Web, pp. 658-665, 2004.
[23] T. Kurki, S. Jokela, R. Sulonen, and M. Turpeinen, “Agents in Delivering Personalized Content Based on Semantic Metadata,” In Proceedings of the 1999 AAAI Spring Symposium Workshop on Intelligent Agents in Cyberspace, pp. 84-93, 1999.
[24] D. J. Lawrie and W. B. Croft, “Generating hierarchical summaries for web searches,” In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 457-458, 2003.
[25] D. Lawrie, W. B. Croft, and A. Rosenberg, “Finding topic words for hierarchical summarization,” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 349-357, 2001.
[26] H. Lieberman, “Autonomous Interface Agents,” In Proceedings of the ACM Conference on Computers and Human Interaction, pp. 67-74, 1997.
[27] Y. S. Maarek, R. Fagin, I. Z. Ben-Shaul, and D. Pelleg, “Ephemeral document clustering for web applications,” Technical Report RJ 10186, IBM Research, 2002.
[28] T. Malone, K. Grant, F. Turbak, S. Brobst, and M. Cohen, “Intelligent Information Sharing Systems,” In Communications of the ACM, 30(5), pp. 390-402, 1987.
[29] C. D. Manning and H. Schutze, “Foundations of Statistical Natural Language Processing,” MIT Press, 2001.
[30] D. Mladeni, “Personal WebWatcher: Design and Implementation,” Technical Report IJS-DP-7472, J. Stefan Institute, Department for Intelligent Systems, Ljubljana, Slovenia, 1998.
[31] M. Montebello, W. Gray, and S. Hurley, “A Personable Evolvable Advisor for WWW Knowledge-Based Systems,” In Proceedings of the 1998 International Database Engineering and Application Symposium, pp. 224-233, 1998.
[32] M. Nagao and S. Mori, “A new Method of N gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese,” In Proceedings of 15th International Conference on Computational Linguistics, pp. 611-615, 1994.
[33] S. Osinski and D. Weiss, “Conceptual clustering using lingo algorithm: Evaluation on open directory project data,” In Proceedings of 5th Conference on Intelligent Information Processing and Web Mining, pp. 369-377, 2004.
[34] M. Pazzani, J. Muramatsu, and D. Billsus, “Syskill & Webert: Identifying Interesting Web Sites,” In Proceedings of the 13th National Conference on Artificial Intelligence, pp. 54-61, 1996.
[35] A. Pretschner, “Ontology Based Personalized Search,” Master’s thesis, University of Kansas, 1999.
[36] J. Rucker and M. J. Polanco, “Siteseer: Personalized Navigation for the Web,” In Communications of the ACM, 40(3), pp. 73-75, 1997.
[37] M. Sanderson and W. B. Croft, “Deriving concept hierarchies from text,” In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206-213. 1999.
[38] J. Shavlik and T. Eliassi-Rad, “Intelligent Agents for Web-Based Tasks: An Advice-Taking Approach,” In Working Notes of the AAAI/ICML-98 Workshop on Learning for text categorization, pp. 63-70, 1998.
[39] B. Sheth, “A Learning Approach to Personalized Information Filtering,” Master’s thesis, Massachusetts Institute of Technology, 1994.
[40] J. Sinclair, “Corpus, Concordance, Collocation,” Oxbridge University Press, 1991.
[41] F. Smadja, “Retrieving Collocations from Text: Xtract,” In Computational Linguistics, 19(1), pp. 143-177, 1993.
[42] H. Sorensen and M. McElligott, “PSUN: A Profiling System for Usenet News,” In Proceedings of CIKM’95 Workshop on Intelligent Information Agents, 1995.
[43] A. Stefani and C. Strappavara, “Personalizing Access to Web Sites: The SiteIF Project,” In Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, pp. 69-74, 1998.
[44] E. Terra and C. L. A. Clarke, “Frequency Estimates for statistical word similarity measures,” In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 165-172 , 2003.
[45] N. Toulas, J. Cho, and C. Olston, “What’s New on the Web? The Evolution of the Web from a Search Engine Perspective,” In Proceedings of the 13th International World Wide Web Conference, pp. 1–12, 2004.
[46] J. Xu and W. Croft, “Query expansion using local and global document analysis,” In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4-11, 1996.
[47] M. Yamamoto and K. W. Church, “Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus,” In Computational Linguistics, 27(1), pp. 1-30, 2001.
[48] O. Zamir and O. Etzioni, “Grouper: a dynamic clustering interface to Web search results,” In Proceedings of the 8th International World Wide Web Conference, pp. 1-12, 1999.
[49] H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma, “Learning to cluster web search results,” In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210-217, 2004.
[50] D. Zhang and Y. Dong, “Semantic, hierarchical, online clustering of web search results,” In Proceedings of the 6th Asia Pacific Web Conference. pp. 69-78, 2004.
[51] http://a9.com/
[52] http://dmoz.org/
[53] http://www.about.com/
[54] http://www.altavista.com/
[55] http://www.ask.com/
[56] http://www.google.com/
[57] http://www.infind.com/
[58] http://www.lycos.com/
[59] http://www.mamma.com/
[60] http://www.metacrawler.com/
[61] http://www.metafind.com/
[62] http://www.miner.uol.com.br/
[63] http://www.vivisimo.com/
[64] http://www.yahoo.com/

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2006-7-7

推文