透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：45

、訪客IP：3.137.222.170

姓名

黃柏森(Po-sen Huang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果
(Applying Semi-Structure Information and User Feedback Information in Filtering Web Page Search Result)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究	★ 探討使用者回饋之半結構化文件字詞特性於檢索文件的應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

網際網路搜尋引擎為現今快速獲取資訊的重要工具，但其所搜尋到的網頁不僅在量上過於龐大，更常是與使用者需求不符之結果。因此，個人化搜尋的需求也因應而生。本研究提出了一套演算法，用於萃取網頁文件特徵，希望透過文件特徵間的相似度比對，分辨出搜尋結果中與使用者需求相關、以及非相關的文件，並藉此過濾掉非相關文件。其中，特徵權重值計算的部份包含了四項因子：HTML強調標籤權重、段落權重、分析組合權重、以及字詞敏感度權重。我們分析各因子對文件特徵的影響，並且與相關演算法作比較，以了解本研究演算法之優劣所在。實驗結果證實，本研究演算法能夠有效萃取出網頁文件中的重要特徵。利用上述文件特徵所建置之使用者興趣檔，能夠提升與相關文件的相似度，及降低與非相關文件的相似度，藉此有效過濾與使用者需求不相關之網頁文件搜尋結果。

摘要(英)

Nowadays, Web search engine has become an important tool to get information rapidly. However, there are too many searching results retrieved from search engine, and always, these searching results do not conform to user’’s request. To reduce personal effort on information searching, personal search are required. In this research, we present a method to extract Web document feature, and by way of comparing the similarities of document features, we could better recognize which documents are conformed to user’’s request. The method of document feature extraction includes four factors: HTML emphasis tag, term position, analytic combination of criteria, and term sensitivity. The results of our experiment show that our method can extract important features of Web document efficiently. The user profile consists of the above document features could increase the similarity with relevant documents, and decrease the similarity with irrelevant documents.

關鍵字(中)

★ 使用者興趣檔
★ 個人化搜尋
★ 網頁特徵擷取
★ 搜尋結果過濾

關鍵字(英)

★ Search result filtering
★ Personalized search
★ User profile
★ Web feature extraction

論文目次

第1章緒論 1
1-1 研究動機 1
1-2 研究目的 2
1-3 研究範圍與限制 3
1-4 論文架構 4
第2章文獻探討 5
2-1 半結構化資訊相關研究 5
2-2 半結構化資訊擷取 6
2-3 相關回饋 15
2-4 字詞敏感度 17
第3章系統架構 20
3-1 網頁分析器 21
3-2 內文分析器 22
3-3 文件特徵建置器 25
3-4 相似度計算器 32
第4章實驗分析 33
4-1 實驗設計與流程 33
4-2 實驗結果與分析 35
4-2-1 演算法因子分析 35
4-2-2 演算法效度比較 44
4-2-3 綜合比較： 48
第5章結論 50
5-1 研究結論與貢獻 50
5-2 未來研究方向 51
參考文獻 52

參考文獻

[1] Aliguliyev,R. M., (2009), "Clustering of document collection – A weighting approach," Expert Systems with Applications, vol. 36, pp. 7904-7916,
[2] Backman,D. and Rubin,J., (1997), "Web log analysis: Finding a recipe for success," Network Computing, vol. 8, pp. 87-93,
[3] Chen,L. and Chue,W. L. Using web structure and summarisation techniques for web content mining. Information Processing and Management 41(5), pp. 1225-1242.
[4] Chou,S. and Chang,W., (2008), "CyberIR--A Technological Approach to Fight Cybercrime," Lecture Notes in Computer Science, vol. 5075, pp. 32-43,
[5] Cooley,R., Mobasher,B. and Srivastava,J., (1997), "Web mining: Information and pattern discovery on the world wide web," Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport Beach, CA, USA, pp. 558-567, Nov 1997.
[6] Cutler,M., Shih,Y. and Meng,W., (1997), "Using the structure of HTML documents to improve retrieval," Proceeding of the USENIX Symposium on Internet Technologies and Systems Monterey, California, December 1997.
[7] Du,T. C., Li,F. and King,I. Managing knowledge on the web – extracting ontology from HTML web. Decision Support Systems In Press, Corrected Proof
[8] Fresno,V. and Ribeiro,A., (2004), "An analytical approach to concept extraction in html environments," Journal of Intelligent Information Systems, vol. 22, pp. 215-235,
[9] Gruber,T. R., (1995), "Toward principles for the design of ontologies used for knowledge sharing," International Journal of Human Computer Studies, vol. 43, pp. 907-928,
[10] He,X., Zha,H., HQ Ding,C. and D. Simon,H. Web document clustering using hyperlink structures. Computational Statistics and Data Analysis 41(1), pp. 19-45.
[11] Jones,K. S., (1972), "A statistical interpretation of term specificity and its application in retrieval," Journal of Documentation, vol. 1, pp. 11-21,
[12] Khan,M. S. and Khor,S. W., (2004), "Web document clustering using a hybrid neural network," Applied Soft Computing Journal, vol. 4, pp. 423-432,
[13] Kim,H. R. and Chan,P. K., (2008), "Learning implicit user interest hierarchy for context in personalization," Applied Intelligence, vol. 28, pp. 153-166,
[14] Kim,S. and Zhang,B., (2001), "Evolutionary learning of web-document structure for information retrieval," Proceedings of Congress on Evolutionary Computation(CEC), Seoul, Korea, pp. 1253-1260, May 2001.
[15] Kim,Y. and Lee,K., (2008), "Extracting logical structures from HTML tables," Computer Standards & Interfaces, vol. 30, pp. 296-308,
[16] Koster,C. H. A. and Beney,J. G., (2007), "On the Importance of Parameter Tuning in Text Categorization," Lecture Notes in Computer Science, vol. 4378, pp. 270,
[17] Liao,Y., (2008), "A weight-based approach to information retrieval and relevance feedback," Expert Systems with Applications, vol. 35, pp. 254-261,
[18] Lin,S. H., Shih,C. S., Chen,M. C., Ho,J. M., Ko,M. T. and Huang,Y. M., (1998), "Extracting classification knowledge of internet documents with mining term associations: A semantic approach," Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 241-249, August 24-28, 1998.
[19] Liu,F., Yu,C. and Meng,W., (2002), "Personalized web search by mapping user queries to categories," Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 558-565, 2002.
[20] Luhn,H. P., (1957), "A statistical approach to mechanized encoding and searching of literary information," IBM Journal of Research and Development, vol. 1, pp. 309-317,
[21] Ma,L., Chen,Q. and Cai,L., (2003), "An adaptive system for online document filtering," IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4712-4717,
[22] Madria,S. K., Bhowmick,S., Ng,W. K. and Lim,E. P., (1999), "Research issues in web data mining," Lecture Notes in Computer Science, pp. 303-312,
[23] Mecca,G., Raunich,S. and Pappalardo,A., (2007), "A new algorithm for clustering search results," Data & Knowledge Engineering, vol. 62, pp. 504-522,
[24] Muslea,I., Minton,S. and Knoblock,C., (1998), "Stalker: Learning extraction rules for semistructured, web-based information sources," Proceedings of AAAI-98 Workshop on AI and Information Integration, Menlo Park, California, pp. 74-81, 1998.
[25] Muslea,I., Minton,S. and Knoblock,C., (1999), "A hierarchical approach to wrapper induction," Proceedings of the Third Annual Conference on Autonomous Agents, Seattle, Washington, pp. 190-197, 1999.
[26] Nick,Z. Z. and Themis,P., (2001), "Web search using a genetic algorithm," IEEE Internet Computing, vol. 5, pp. 18-26,
[27] Pitkow,J., (1997), "In search of reliable usage data on the WWW," Computer Networks and ISDN Systems, vol. 29, pp. 1343-1355,
[28] Porter,M. F., (1980), "An algorithm for suffix stripping," Program, vol. 3, pp. 130-137,
[29] Quiroga,L. M. and Mostafa,J., (2002), "An experiment in building profiles in information filtering: the role of context of user relevance feedback," Information Processing & Management, vol. 38, pp. 671-694,
[30] Rocchio,J. J., (1966), "Document Retrieval Systems: Optimization and Evaluation," ,Unpublished doctoral dissertation ed.Cambridge, MA, USA: Harvard University,
[31] Salton,G. and Lesk,M., (1968), "Computer evaluation of indexing and text processing," Journal of the ACM (JACM), vol. 15, pp. 8-36,
[32] Salton,G., Wong,A. and Yang,C., (1975), "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620,
[33] Salton,G. and McGill,M. J., (1983), "Introduction to Modern Information Retrieval." New York: McGraw-Hill,
[34] Smith,K. A. and Ng,A., (2003), "Web page clustering using a self-organizing map of user navigation patterns," Decision Support Systems, vol. 35, pp. 245-256,
[35] Speretta,M. and Gauch,S., (2005), "Personalized search based on user search histories," Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 622-628, 2005.
[36] Xu,J., Liu,D. and Hu,M., (2004), "Feature selection and text classification for chinese web documents," Proceedings of 2004 International Conference on Machine Learning and Cybernetics, pp. 1304-1309, 26-29 Aug 2004.
[37] Zhang,H., Ma,Y., Zhang,Q. and Xie,P., (2005), "Study and design of chinese concept-based search engine," Proceedings of ISCIT2005, pp. 40-43, 2005.

指導教授

周世傑(Shih-chieh Chou)

審核日期

2009-7-3

推文