使用基因演算法與相關回饋於協助網頁搜尋

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：57

、訪客IP：18.220.78.7

姓名

張永霖(Yeong-Lin Chang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

使用基因演算法與相關回饋於協助網頁搜尋
(Using a Genetic Algorithm and Relevance Feedback in Assisting Web Search)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

全球資訊網的資料量從發展之初至今呈現加速度成長的趨勢，早已成為人們獲取資訊的重要管道之一，因為其資料量龐大，全球資訊網的使用者遭遇了資訊過載的問題，因而求助於目錄服務與搜尋引擎。但目錄服務限於維護人力不足，提供索引的網頁往往不敷使用；搜尋引擎的搜尋結果與使用者的資訊需求往往相關性過低，或搜尋結果過多，使用者仍須自行過濾所需的資訊。
本研究試圖建置一個架構於現有搜尋引擎之上的智慧型代理人。根據使用者提供相關的範例文件，建立使用者興趣檔。運用基因演算法搜尋可能的查詢字串，透過搜尋引擎蒐集網路上的相關網頁，以向量空間模式表示各網頁文件的內容，並評估網頁與使用者興趣檔的相似程度，藉此引導基因演算法搜尋更適合的查詢字串。並根據使用者對檢索結果的評等，配合相關回饋機制調整使用者興趣檔，逐次改進查詢的效果。本研究實作之系統兼顧無檢索主題限制、軟硬體需求低且使用者額外負擔少等方面，對於網頁搜尋及使用者興趣學習上，有令人滿意的表現。

摘要(英)

World Wide Web(WWW) is growing faster and faster since its emergence, and is one of our major information sources in daily life. Because the data quantity of WWW approximates to infinity, information-overloading problems bother users all the time. To retrieve information in the right scope and content has become an important issue. Presently, directory services and keywords searching are two major ways that can help users. Unfortunately, both ways have shortcomings. Referring to directory services, the problem is that a labor-intensive activity is required to create and maintain directory. Besides, web pages are usually not fully indexed. As for keywords searching, the problem is that too many unrelated information are usually provided that users have to spend a lot of effort on filtering.
Our research tries to construct an intelligent agent to assist information retrieving. It builds a user profile by parsing and analyzing example documents provided by the user. It uses a genetic algorithm to search the possible query strings combined by the keywords from the user profile. It collects the relevant web pages via search engine. Each web page is represented in vector space model. It tries to search more fitting query strings by evaluating the similarity between web pages and user profile. According to the user evaluations, relevance feedback mechanism refines the user profile to improve the query results. This proposed system provides a satisfying performance in web search and learning users’ interests. It can work for every search subject with low software and hardware requirement and less user extra interferences.

關鍵字(中)

★ 基因演算法
★ 相關回饋
★ 資訊檢索
★ 詞頻
★ 全球資訊網

關鍵字(英)

★ relevance feedback
★ genetic algorithm
★ World Wide Web
★ information retrieval
★ term frequency

論文目次

第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究範圍與限制 2
1.4 研究流程 3
1.5 論文架構 3
第二章文獻探討 5
2.1 資訊檢索 5
2.2 全球資訊網與搜尋服務 10
2.3 基因演算法 14
2.4 相關回饋 17
第三章系統設計 20
3.1 研究構想 20
3.2 系統架構 23
第四章實驗結果 37
4.1 實驗設計與進行 37
4.2 實驗結果與分析 38
第五章結論 47
5.1 研究結論與貢獻 47
5.2 未來研究方向 48
參考文獻 49
附錄一 Stop list of words 53
圖目錄
圖1-1：研究流程圖 4
圖2-1：資訊檢索模式圖 6
圖2-2：資訊過濾模式圖 6
圖2-3：字詞出現頻率與字數排列關係圖 7
圖2-4：網際網路成長趨勢圖 11
圖2-5：搜尋引擎索引容量圖 13
圖2-6：基因演算法架構 15
圖2-7：Webnaut 相關回饋的參數設定 19
圖3-1：資訊檢索方法示意圖 20
圖3-2：系統外部環境圖 23
圖3-3：系統模組圖 25
圖3-4：染色體編碼範例 32
圖4-1：實驗一結果(折線圖) 39
圖4-2：實驗二結果(折線圖) 41
圖4-3：實驗三結果(折線圖) 42
圖4-4：實驗四結果(折線圖) 44
圖4-5：實驗五結果(折線圖) 45
表目錄
表3-1：本研究與 Webnaut 的異同比較 22
表3-2：系統軟硬體環境 24
表3-3：使用者興趣檔建立範例 26
表3-4：詞頻調整策略 27
表3-5：敏感度調整策略 28
表3-6：詞頻調整參數 28
表3-7：詞頻調整範例 29
表3-8：敏感度調整範例 30
表3-9：使用者興趣檔範例 33
表3-10：文件詞頻資料範例 33
表3-11：使用者興趣檔規格 36
表4-1：實驗的操縱變數設定 38
表4-2：實驗一的變數設定 39
表4-3：實驗一結果(數據彙整) 39
表4-4：實驗二的變數設定 40
表4-5：實驗二結果(數據彙整) 40
表4-6：實驗三的變數設定 42
表4-7：實驗三結果(數據彙整) 42
表4-8：實驗四的變數設定 43
表4-9：實驗四結果(數據彙整) 43
表4-10：實驗五的變數設定 45
表4-11：實驗五結果(數據彙整) 45

參考文獻

中文部分
[1] 吳俊興，「網際網路分類搜尋引擎設計之研究」，台灣大學資訊工程研究所博士論文，民88。
[2] 莊慧美，「以智慧型計算方法探索文件分類」，屏東科技大學資訊管理研究所碩士論文，民88。
[3] 陳建銘，「智慧型瀏覽代理程式於網站上的應用」，淡江大學資訊工程研究所碩士論文，民89。
[4] 曾引蕙，「涵義導向之網頁自動學習與分類」，台灣大學資訊管理研究所碩士論文，民89。
[5] 嚴嘉錚，「以相關回饋增進搜尋引擎使用效率之代理程式建構」，雲林科技大學資訊管理技術研究所碩士論文，民87。
英文部分
[6] Andrew S. Tanenbaum, Computer Network, Prentice-Hall, 1996.
[7] Bernard J. Jansen & Amanda Spink & Tefko Saracevic, “Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web,” Information Processing and Management, 36(2), 207-227, 2000.
[8] Brian H. Murray & Alvin Moore, “Sizing the Internet,” Cyveillance, July 2000.
[9] C. J. van Rijsbergen, Information Retrieval, Butterworth, 1975.
[10] Chris Buckley & Gerard Salton & James Allan, “The Effect of Adding Relevance Information in a Relevance Feedback Environment,” In Proceedings of the 7th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 292-300, 1994.
[11] Daniel Dreilinger & Adele E. Howe, “Experiences with Selecting Search Engines Using Metasearch,” ACM Transactions on Information Systems, 15(3), 195-222, 1997.
[12] David E. Goldberg, Genetic Algorithm in Search, Optimization, and Machine Learning, Addison-Wesley, 1989.
[13] Donna Harman, “Relevance Feedback Revisited,” In Proceedings of the 15th ACM SIGIR International Conference on Research and Development in Information Retrieval, 1-10, 1992.
[14] Douglass R. Cutting & David R. Karger & Jan O. Pedersen & John W. Tukey, “Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collection,” In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 318-329, 1992.
[15] Erik Selberg & Oren Etzioni, “The MetaCrawler Architecture for Resource Aggregation on the Web,” IEEE Expert, 12(1), 11-14, 1997.
[16] Gerard Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.
[17] Gerard Salton, “The SMART Project-status Report and Plan,” The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice-Hall, 3-11, 1971.
[18] Gerard Salton & M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
[19] Hsinchun Chen, “Machine Learning for Information Retrieval: Neural Network, Symbolic Learning, and Genetic Algorithms,” Journal of the American Society for Information Science, 46(3), 194-216, 1995.
[20] Hsinchun Chen & Jinwoo Kim, “GANNET: A Machine Learning Approach to Document Retrieval,” Journal of Management Information Systems, 11(3), 7-41, 1994.
[21] J. H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
[22] J. J. Rocchio, “Relevance Feedback in Information Retrieval,” The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice-Hall, 313-323, 1971.
[23] K. A. De Jone, “On Using Genetic Algorithms to Search Program Space,” In Proceedings of the 2nd International Conference on Genetic Algorithms, 210-216, 1987.
[24] Koji Eguchi & Hidetaka Ito & Akira Kumamoto & Yakichi Kanata, “Adaptive and Incremental Query Expansion for Cluster-based Browsing,” In Proceedings of the 6th International Conference on Database Systems for Advanced Applications, 25-34, 1999.
[25] M. Porter, “An Algorithm for Suffix Stripping,” Program, 14(3), 130-138, 1980.
[26] Marti A. Hearst & Jan O. Pedersen, “Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results,” In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 76-84, 1996.
[27] Michael Gordon, “Probabilistic and Genetic Algorithms for Document Retrieval,” Communications of the CAM, 31(10), 1208-1218, 1988.
[28] Nicholas J. Belkin & W. Bruce Croft, “Information Filtering and Information Retrieval: Two Side of the Same Coin?” Communications of the CAM, 35(12), 29-38, 1992.
[29] Peter Lyman & Hal R. Varian, “How Much Information?” Regents of the University of California, October 2000.
[30] Praveen Pathak & Michael Gordon & Weiguo Fan, “Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation,” In Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, 533-540, 2000.
[31] Ricardo Baeza-Yates & Berthier Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
[32] S. E. Robertson & K. Sparck Jones, “Relevance Weighting of Search Terms,” Journal of the American Society for Information Science, 27(3), 129-146, 1976.
[33] Steve Lawrence & C. Lee Giles, “Searching the World Wide Web,” Science, 280(5360), 98-100, 1998.
[34] William B. Frakes & Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.
[35] Zacharis Z. Nick & Panayiotopoulos Themis, “Web Search Using a Genetic Algorithm,” IEEE Internet Computing, 5(2), 18-26, 2001.
網站部分
[36] GNU’s Not Unix!
http://www.gnu.org/.
[37] Google.
http://www.google.com/.
[38] Jupiter Media Metrix.
http://www.jmm.com/.
[39] LiveTopics Help.
http://tsc.k12.in.us/training/SEARCH/ALTAVIST/help.htm.
[40] Netcraft Web Server Survey.
http://netcraft.com/.
[41] Search Engine Watch.
http://searchenginewatch.com/.
[42] Sugal Project.
http://www.dur.ac.uk/andrew1.hunter/Sugal/.
[43] The Internet Engineering Task Force.
http://www.ietf.org/.
[44] The Porter Stemming Algorithm.
http://www.tartarus.org/~martin/PorterStemmer/.
[45] The World Wide Web Consortium.
http://www.w3.org/.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2002-6-25

推文