以知識本體為基礎的中文查詢擴展

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.227.48.28

姓名

江舜絃(Shun-hsien Chiang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以知識本體為基礎的中文查詢擴展
(A Knowledge-based Chinese Query Expansion System)

相關論文

★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例	★ 由組織層面探討軟體程序成熟度
★ 生物晶片之基因微陣列影像分析之研究	★ 台灣資訊家電產業IPv6技術地圖與發展策略之研究
★ 台灣第三代行動通訊產業IPv6技術地圖與發展策略之研究	★ 影響消費者使用電子書閱讀器採納意願之研究
★ 以資訊素養映對數位學習平台功能之研究	★ 台商群聚指標模式與資料分析之研究
★ 未來輪輔助軟體發展之需求擷取研究	★ 線上溝通平台融入概念圖合作學習之比較研究
★ 線上討論一定就會更好嗎？探討影響群體創造知識過程中知識分享與知識採用因素之研究	★ 群體決策支援系統對團隊績效之影響-以時間壓力為調節因素
★ 在文化差異下，探討創新協同合作工具針對不同任務性質之績效	★ Icon辨識與其設計屬性之關聯性探討
★ 創新協同合作工具之先導因素對績效的影響：強調工具特性與科技特性	★ 以工作流程圖展現未來研究方法配適於前瞻研究流程之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在面對現在資訊爆炸的時代，搜尋引擎變成每個人生活中不可或缺的工具，因此如何協助使用者過濾過量的資訊，同時考量個人搜尋意圖，達成個人化的搜尋排序一直是相當重要的議題。
基於上述的理念，本研究以知識本體描繪使用者偏好的框架為藍圖，提出在中文環境下的關鍵詞推薦系統，實現中文環境下的查詢擴展。透過網頁爬行器分析使用者過去瀏覽的所有網站地圖，以正規概念分析法自動建構涵蓋面較廣的個人化領域知識，同時以「知網」為輔，結合查詢擴展的方法與個人化知識本體的自動學習，檢索到更為完備的資訊。當使用者輸入關鍵字時，系統會比對關鍵字與使用者檔案中的個人化知識本體中的概念，產生與關鍵字概念相仿的延伸關鍵字集合推薦給使用者，藉以擷取更多描述同一概念的文件資訊。根據實驗結果顯示，本系統有效的提升了七成以上的檢索精確率，最佳的效能提升了兩倍，證明藉由過濾大部分與使用者興趣不相關的網頁，以取得使用者真正想要的資訊，相較於傳統的本體論查詢擴展方法，本研究提出的演算法利用使用者知識庫的自動產生、涵蓋面寬廣的訓練資料來源擷取、半自動的中文化擴展字詞推薦與適用於繁體漢字的知網義原庫，的確能有效提升在中文環境下資訊檢索的精確率。

摘要(英)

Search engine has become an essential tool in the era of the information explosion, hence the topic of helping users to filter an excess of information and take personal implicit searching intentions into consideration in order to reach personalized searching ranking has always been important.
Knowledge ontology was used to depict user’s preference and a Chinese keyword recommendation system was proposed to accomplish a Chinese Query Expansion. Analyzing the site maps of the whole user’s past browsing via web crawler, constructing a wider range of personalized domain knowledge automatically by Formal Concept Analysis, and combining Query Expansion and personal ontology which is automatic-learning through HowNet, the more complete information can be accessed easily. When user submits keywords, the system will compare keywords and concepts of personalized ontology in user’s profile in order to produce extended keyword sets similar to the keywords inputted and to be recommended to user to acquire more document information including the same concepts. The experimental results show that the system increases the retrieval precision over 70% and the retrieval precision almost doubles.
By filtering most web documents unconcerned with user’s interests to acquire the actual needed information. The algorithm we proposed that provide automatic-generated user’s knowledge database, a wider range of training data source, a semi-automatic recommended mechanism of Chinese expansion words, and a sememe database of HowNet in Traditional Chinese, is proved to have better retrieval accuracy in the Chinese environment compare to methods of ordinary ontology query expansion.

關鍵字(中)

★ 正規概念分析法
★ 查詢擴展
★ 本體論
★ 知網

關鍵字(英)

★ Formal Concept Analysis
★ HowNet
★ Ontology
★ Query Expansion

論文目次

中文摘要 ................................................................. I
Abstract ................................................................ II
圖目錄 ................................................................... V
表目錄 .................................................................. VI
第一章緒論 ............................................................. 1
1.1 研究動機 ........................................................... 1
1.2 研究目的 ........................................................... 2
1.3 研究限制 ........................................................... 2
1.4 論文架構 ........................................................... 2
第二章文獻探討 .......................................................... 3
2.1 語意網 ............................................................. 3
2.2 本體論 ............................................................. 6
2.2.1 本體論的定義 ................................................... 6
2.2.2 知識本體的組成元素 ............................................. 8
2.2.3 本體論的建置方法 ............................................... 8
2.3 正規概念分析法 .................................................... 11
2.3.1 正規本文 ...................................................... 11
2.3.2 概念方格 ...................................................... 13
2.4 中文字詞處理技術 .................................................. 13
2.4.1 中文詞知識庫小組 .............................................. 14
2.4.2 知網 .......................................................... 17
2.4.3 基於知網的詞語相似度計算 ...................................... 18
2.5 查詢擴展 .......................................................... 22
2.6 關鍵字權重計算 .................................................... 24
第三章系統分析與設計 ................................................... 25
3.1 系統架構 .......................................................... 25
3.2 文件前處理 ........................................................ 27
3.3 關鍵詞彙擷取 ...................................................... 28
3.4 知識建構 .......................................................... 31
3.5 擴展關鍵詞推薦 .................................................... 34
第四章系統實作與驗證 ................................................... 37
4.1 開發工具與實驗環境 ................................................ 37
4.2 使用者檔案建構資料來源 ............................................ 38
4.3 評估方法 .......................................................... 39
4.4 實驗結果與分析 .................................................... 40
4.4.1 實驗設計 ...................................................... 40
4.4.2 實驗結果 ...................................................... 42
4.4.3 實驗分析 ...................................................... 46
4.5 系統效能評估 ...................................................... 47
第五章結論與建議 ....................................................... 50
5.1 結論與貢獻 ........................................................ 50
5.2 未來研究方向 ...................................................... 52
參考文獻 ................................................................ 54
英文文獻 .............................................................. 54
中文文獻 .............................................................. 57
網站部分 .............................................................. 58

參考文獻

[1] Alexander Maedche, Steffen Staab, (2001); “Ontology Learning for the Semantic Web,” IEEE Intelligent System, Vol. 16, No. 2, pp. 72-79
[2] A. Maedche, (2003); “Ontology Learning: Framework, Techniques and a Software Environment,” 1st MEANING workshop: Word Sense Disambiguation and Lexical Acquisition
[3] Amanda Spink, Dietmar Wolfram, Major B. J. Jansen, Tefko Saracevic, (2001); “Searching the Web: The Public and Their Queries,” Journal of the American Society for Information Science, 52(3), pp. 226-234
[4] Buckley, C., Salton, G., Allan, J., Singhal, A., (1994); “Automatic Query Expansion Using SMART: TREC 3,” Proceeding of Third Text Retrieval Conference, NIST Special Publication 500-225, pp. 69-80
[5] C. Maria (Marijke) Keet, (2004); “Aspects of Ontology Integration,” Literature research & background information for the PhD proposal, School of Computing, Napier University, Scotland
[6] C. Sporleder, (2002); “A Galois Lattice Based Approach to Lexical Inheritance Hierarchy Learning,” Proc. ECAI 2002 Workshop Machine Learning and Natural Language Processing for Ontology Eng. (OLT’02)
[7] Gruber, T. R. (1993); “Towards principles for the design of ontologies used for knowledge sharing,”in Formal Ontology in Conceptual Analysis and knowledge Representation, edited by Guarino, N. and Poli, R., Kluwer Academic Publishers, Deventer, The Netherlands
[8] Gruber, T. R. (1993); “A Translation Approach to Portable Ontology Specifications,” Knowledge Acquisition, Vol. 5, pp. 199–220
[9] Ganter, B., Wille, R. (1999); “Formal Concept Analysis: Mathematical Foundations,” Springer, Berlin-Heidelberg
[10] Helen J. Peat, Peter Willett, (1991); “The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems,” Journal of the American Society Information Science, 42(5), pp. 378-383
[11] Jinxi Xu, W. Bruce Croft, (1996); “Query Expansion Using Local and Global Document Analysis,” Proceedings of Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4-11
[12] J.H. Gennari, P. Langley, and D. Fisher, (1990); “Models of Incremental Concept Formation,” Machine Learning: Paradigms and Methods, pp. 11-62
[13] John Makhoul, Francis Jubala, Richard Schwartz, Ralph Weischedel, (1999); “Performance Measures for Information Extraction,” Proceedings of DARPA Broadcast News Workshop, Herndon, VA.
[14] Lang K, (1995); “NEWSWEEDER: Learning to Filter Netnews,” Proceedings of ICML-95, 12th International Conference on Machine Learning, pp. 331-339
[15] Marja-Riitta, Eric Miller, (2001); “W3C Semantic Web Activity,” Proceedings of the Semantic Web Kick-off Seminar, Finland, Nov. 2, 2001
[16] P. Clerkin, P. Cunningham, and C. Hayes, (2001); “Ontology Discovery for the Semantic Web Using Hierarchical Clustering,” Proc. European Conf. Machine Learning (ECML) and European Conf. Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD-2001)
[17] Peter Burmeister, (2003); “Formal Concept Analysis with ConImp: Introduction to the Basic Features,” Technical Report, Arbeitsgruppe Allgemeine Algebra und Diskrete Mathematik, Technische Hochschule Darmstadt, Darmstatdt, Germany
[18] Quan Thanh Tho, Siu Cheung Hui, Tru Hoang Cao, (2006); “Automatic Fuzzy Ontology Generation for Semantic Web,” IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No.6, pp. 842-856
[19] Qun Liu, Sujian Li, (2002); “Word Similarity Computing Based on How-net,” Computational Linguistics and Chinese Language Processing, Vol. 7, No. 2, pp. 59-76
[20] Qiu, Y., Frei, H. P. (1993); “Concept Based Query Expansion,” Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 160-169
[21] R. Scott Cost, T. Finin and A. Joshi, (2002); “ITtalks: A Case Study in the Semantic Web and DAML+OIL,” IEEE Intelligent Systems, pp.40-47
[22] R. Wille, (2005); “Formal Concept Analysis as Mathematical Theory of Concepts and Concept Herarchies,” Formal Concept Analysis 2005, pp. 1-33
[23] Stevens, R., Goble, C. A., & Bechhofer, S. (2000); “Ontology-based knowledge representation for bioinformatics,” Briefings in Bioinformatics, Vol. 1, No. 4, pp. 398-414(9)
[24] Schubert Foo, Hui Li, (2004); “Chinese Word Segmentation and Its Effect on Information Retrieval,” Information Processing and Management, Vol. 40, pp. 161-190
[25] Salton, G., Fox, E. A., Buckley, C. and Voorhees, E. M. (1983); “Boolean Query Formulation with Relevance Feedback,” Communications of the ACM, Vol. 26, January
[26] Tim Berners-Lee, James Hendler, Ora Lassila, (2001); “The Semantic Web,” Scientific American Magazine, pp. 34-43
[27] U. Priss, (2003); “Linguistic Applications of Formal Concept Analysis,” Proc. First Int’l Conf. Formal Concept Analysis
[28] U. Priss, (2006); “Formal Concept Analysis in Information Science,” Annual review of information science and technology (ARIST), Vol. 40, pp.521-543
[29] William, S., Austin, T., (1999); “Ontologies,” IEEE Intelligent systems, Jan/Feb., pp. 18-19
[30] Wiebke Petersen, (2002); “A Set-Theoretical Approach for the Induction of Inheritance Hierarchies,” Theoretical Computer Science, Vol.51
[31] Yi Guan, Xiao-Long Wang, Xiang-Yong Kong, Jian Zhao, (2002); “Quantifying Semantic Similarity of Chinese Words from HowNet,” Proc. International Conference on Machine Learning and Cybernetics, Vol. 1, pp. 234-239
[32] Yutaka Sasaki, (2007); “The Truth of F-measure,” Teaching, Tutorial materials
[33] Zhengyu ZHU, Jingqiu XU, Xiang REN, Yunyan TIAN, Lipei LI, (2007); “Query Expansion Based on a Personalized Web Search Model,” Proceedings of the Third International Conference on Semantics, Knowledge and Grid, China, Oct. 29-31, 2007, pp. 128-133
[34] 黃居仁(2003)；「語意網、詞網與知識本體：淺談未來網路上的知識運籌」，佛教圖書館館訊，33期，6-21頁
[35] 婁德權、左豪官、吳嘉龍、周兆龍(2009)；「語意網及其應用」，資通安全分析專論
[36] 鄭永原(2003)；「行動資料庫之研究」，朝陽科技大學資訊管理系，碩士論文
[37] 潘紫菁(2006)；「應用本體論強化軟體技術之知識管理」，國立成功大學工程科學研究所，碩士論文
[38] 蔡旺典(2007)；「建立個人化知識本體來輔助網頁行為探勘－以個人化排序為例」，朝陽科技大學資訊管理系，碩士論文
[39] 王文君(2004)；「初探Ontology」，國立台灣大學建築與城鄉研究所課程資料
[40] 陳建明(2005)；「植基於本體論之中文文件摘要系統」，國立成功大學資訊管理研究所，碩士論文
[41] 許正欣(2004)；「語意網上自動化建構本體論之研究」，輔仁大學資訊管理學系，碩士論文
[42] 顏國偉、譚慧敏(1999)；「基於知網的語料標注手冊」，香港科技大學計算機科學系，新加坡南洋理工大學中華語言文化中心，5-9頁
[43] W3C, http://www.w3.org/
[44] 知網, http://www.keenage.com/
[45] 中文斷詞系統, http://ckipsvr.iis.sinica.edu.tw/
[46] 今日新聞, http://www.nownews.com/

指導教授

薛義誠、謝浩明
(Y. C. Shiue、How-ming Shieh)

審核日期

2009-7-16

推文