主題概念階層模型：概念式搜尋

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：19

、訪客IP：18.222.69.152

姓名

范瓊文(Tiffany Fan) 查詢紙本館藏

畢業系所

網路學習科技研究所

論文名稱

主題概念階層模型：概念式搜尋
(Topic Concept-Hierarchy Model:Concept-Based Search)

相關論文

★ 透過視覺化影片瀏覽行為分析提升磨課師課程之完課率	★ 應用智慧分類法提升文章發佈效率於一企業之知識分享平台
★ 家庭智能管控之研究與實作	★ 開放式監控影像管理系統之搜尋機制設計及驗證
★ 資料探勘應用於呆滯料預警機制之建立	★ 探討問題解決模式下的學習行為分析
★ 資訊系統與電子簽核流程之總管理資訊系統	★ 製造執行系統應用於半導體機台停機通知分析處理
★ Apple Pay支付於iOS平台上之研究與實作	★ 應用集群分析探究學習模式對學習成效之影響
★ 應用序列探勘分析影片瀏覽模式對學習成效的影響	★ 一個以服務品質為基礎的網際服務選擇最佳化方法
★ 維基百科知識推薦系統對於使用e-Portfolio的學習者滿意度調查	★ 學生的學習動機、網路自我效能與系統滿意度之探討-以e-Portfolio為例
★ 藉由在第二人生內使用自動對話代理人來改善英文學習成效	★ 合作式資訊搜尋對於學生個人網路搜尋能力與策略之影響

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

以學術的角度來看，資訊擷取技術(Information Retrieval, IR)主要是用於透過搜尋機制幫助使用者在圖書館找尋文件內容（Content）。在圖書館環境中，數位或實體文件是必須經過分類與整理，而每份文件內容也必須經過人工的處理，將文件的基本資訊如文件（書籍）名稱、作者、出版日期、出版商、摘要、分類類別、關鍵字詞與數位內容本身以結構化與非結構化的資料存放於資料庫中。因而資訊擷取的應用主要是依據搜尋者的關鍵字詞，在結構化與非結構化的資料庫中找到所有可能與相關的數位內容，並依其優先順序呈現。
　　一般搜尋數位內容的作法是利用關鍵字詞比對(Keyword Matching)或各種的相似度公式作資訊的擷取。然而採用關鍵字詞查詢或相似度公式的作法，不容易從數位圖書館中擷取到『所有相關的資料』，原因是人們使用自然語言作表逹，用不同的詞彙表達本身要傳逹的概念，致使以關鍵字詞比對的召回率(Recall)無法提升。本篇論文提出的主題概念階層模型(Topic/ Domain Concept-Hierarchy Model)是將領域知識概念作階層式的分類，形成一個領域概念階層。每個分類項為一個概念節點，其對應相關的文件集。概念節點關鍵字詞來自於文件集，在領域階層那些概念節點中，搜尋使用者所下的關鍵字詞，若二者符合，表示搜尋者想了解此概念節點的內容，此節點稱為相關概念節點（Relevant Conceptual Node），而以下的節點稱為相關概念子節點( Relevant Conceptual Subnode)，利用五個變動因素：節點之階層數,使用者的關鍵字落於節點之個數,使用者關鍵字詞與相關概念節點之cosine相似度,概念節點與子節點之距離,節點之分支度來調整計算相關概念節點與子節點之權數與相似度值。
　　實驗數據證明主題概念階層模型能有效地應用在資訊擷取，能帶出搜尋者想要的搜尋目標與其相關的數位內容，並依據使用者最合適與相關聯的優先順序作排列，在最短時間內擷取他最想要的數位內容。

摘要(英)

In viewpoint of the academic, Information Retrieval method is used to facilitate content search in a library environment. In a library, librarian needs to establish description information of digital content or physical content before stored. The description information will stored into repository including title, authors, published date, publisher, abstract, category, terms and the contents. Therefore, retrieval process is implemented based on comparison between user’s query and repository.
In general, keyword matching is a common approach in information retrieval research. However, this approach can not always brings a lot of all relevant information. The main reason cause this result is that people may use different words to access a specified information. Therefore, the recall performance of keyword-match is poor. In our study, we proposed Topic/ Domain Concept-Hierarchy Model to transform domain knowledge into hierarchical category in a domain hierarchy. Each category is a concept node and has corresponding content set. The represented keyword of node is extracted from content set. The matching is executed in the domain hierarchy to compute the similarity between user’s query and keywords in domain hierarchy. If matched, it means user intend to browse corresponding content set. The Node is call relevant conceptual node (RCN) and its bellow nodes are relevant conceptual sub-node (RCS).
Experiment result shows the proposed Topic/ Domain Concept-Hierarchy Model can be applied to information retrieval effectively. The recall and precision has been significantly improved comparison with traditional method. The responded result is ranked followed the correlation in domain hierarchy. In this way, users can retrieval suitable material in a short time.

關鍵字(中)

★ 資訊擷取模型
★ 領域概念階層
★ 概念式搜尋
★ 資訊擷取

關鍵字(英)

★ Concept-Based Information Retrieval
★ Domain Conceptual Hierarchy
★ Information Extraction
★ Information Retrieval
★ Retrieval Model

論文目次

中文摘要.............................................................................................................................. I
英文摘要........................................................................................................................... III
致謝................................................................................................................................... V
目錄.................................................................................................................................VII
表、圖目錄....................................................................................................................... IX
演算法、主題概念階層模型公式、引用目錄................................................................ X
第1 章. 簡介...................................................................................................................... 1
1.1 研究動機................................................................................................................... 1
1.2 研究目的................................................................................................................... 2
1.3 研究方法................................................................................................................... 3
第2 章. 相關研究.............................................................................................................. 5
2.1 傳統的資訊擷取模型(CLASSIC INFORMATION RETRIEVAL MODEL)......................... 5
2.1.1 布林函數模型（Boolean Model） .................................................................... 5
2.1.2 向量空間模型(Vector Space Model) .................................................................. 5
2.2 搜尋的召回率(RECALL, COMPLETE)與精確率(PRECISION, SOUND) ........................ 8
2.3 概念階層化(CONCEPT HIERARCHY).......................................................................... 9
2.4 相關研究總結......................................................................................................... 13
第3 章. 主題概念階層模型(TOPIC CONCEPT-HIERARCHY MODEL)--概念式搜
尋...................................................................................................................................... 14
3.1 主題概念階層的定義(DEFINITION OF TOPIC CONCEPTUAL HIERARCHY ) .............. 14
3.1.1 搜尋者的行為分析(Analysis of Seeker’s behavior) ........................................ 15
3.1.2 關鍵字詞之前置處理作業（Query Preprocessing ）.................................... 16
3.2 主題概念階層模型相關名詞定義......................................................................... 17
3.3 權重策略－相關概念節點與相關概念子節點..................................................... 22
3.4 相似度的計算與排序(CALCULATE SIMILARITY MEASURE AND RANKING) .............. 25
3.5 系統架構與流程..................................................................................................... 28
第4 章. 系統實作............................................................................................................ 31
4.1 實驗環境................................................................................................................. 31
4.2 實驗目的................................................................................................................. 32
4.3 實驗結果................................................................................................................. 32
第5 章. 效能分析與比較................................................................................................ 36
5.1 F-MEASURE (TOPIC CONCEPT-HIERARCHY MODEL, TCHM).................................... 36
5.2 前十筆之平均滿意度值......................................................................................... 37
5.3 平均回應時間(AVERAGE RESPONSE TIME ) .............................................................. 37
第6 章. 結論與未來展望................................................................................................ 39
6.1 結論與貢獻............................................................................................................. 39
6.2 未來展望................................................................................................................. 40
第7 章. 參考文獻............................................................................................................ 42
第8 章. 附錄.................................................................................................................... 47

參考文獻

[1] Xiaomeng Su, Sari Hakkarainen and Terje Brasethvik, “Semantic enrichment for improving systems interoperability”, SAC’04, ACM, Nicosia, Cyprus, 2004, March, pp.1634-1641.
[2] Bruno Possas, Nivio Ziviani, Wagner Meira Jr. and Berthier Ribeiro-Neto, “Set-Based Model: A New Approach for information Retrieval”, SIGIR’02, ACM, TAMPERE, FINLAND , August 11-15, 2002, PP. 230-237.
[3] Stanley Loh, Leandro Krug Wives and Jose Palazzo M. de, “Concept-Based Knowledge Discovery in Texts Extracted from the Web”, Proceedings of the 25th annual international ACM SIGIR　conference on Research and development in information retrieval, ACM, Volume 2, Issue1, pp. 29-38.
[4] Peter V. Henstock, Daniel J. Pack, Young-Suk Lee, Clifford J. Weinstein , “Toward an improved concept-based information retrieval system”, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM ,New Orleans, Louisiana,September 9-12, 2001.
[5] Chau, R., Yeh, C.-H., "Fuzzy conceptual indexing for concept-based cross-lingual text retrieval",Internet Computing, IEEE , Volume: 8 , Issue: 5 , Sept.-Oct. 2004,Pages:14 – 21.
September 2001
[6] Karypis, George and Eui-Hong Han. 2000. Concept indexing: A fast dimensionality reduction algorithm with applications to document retrieval and categorization. Technical report tr-00-0016,
University of Minnesota.
[7] M. Sanderson and Bruce Croft, "Deriving Concept Hierarchies From Text", International Conference on Research and Development in Information Retrieval (SIGIR 1999), pp.206-213.
[8] Un Yong Nahm, Raymond J. Mooney,"Using Information Extraction to Aid the Discovery of Prediction Rules from Text",Proceedings of the KDD(Knowledge Discovery in Databases)-2000 Workshop on Text Mining, Boston, MA, , August 2000, pp.51-58.
[9] Christina Yip Chung, Raymond Lieu and Jinhui Liu,"Thematic Mapping – From Unstructured Documents to Taxonomies", CIKM ’02, ACM, McLean, Virginia, USA, November 4-9, PP.603-610.
[10] PRASANNA GANESAN, HECTOR GARCIA-MOLINA, and JENNIFER WIDOM “Exploiting Hierarchical Domain Structure to Compute Similarity”, ACM Transactions on Information Systems, Vol. 21, No. 1, January 2003, Pages 64–93.
[11] S.K.M. Wong, Vijay V. Raghavan, "Vector space model of information retrieval: a reevaluation", Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval, British Computer Society, Cambridge, England,1984, pp.167 – 185.
[12] Valerie Cross, “Fuzzy Semantic Distance Measures Between Ontological Concepts”, Processing NAFIPS '04, IEEE Annual Meeting of the Volume 2, 27-30 June 2004 pp.635 - 640 Vol.2
[13] Jianyong Wang, Jiawei Han, Jian Pei, "CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets", SIGKDD ’03, ACM, Washington, August 24-27,2003.
[14] Jan Paralic, Ivan Kostial, "Ontology-based Information Retrieval", Web Technologies Supporting Direct Participation in Democratic Processes", ACM.
[15] Jay j. Jiang and David W. Conrah, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy”, In Proceedings of International Conference Research on Computational Linguistics, Taiwan, 1997.
[16] M. Andrea Rodriguez, Max J. Egenhofer, “Determining Semantic Similarity among Entity Classes from Different Ontologies”,IEEE Transactions on Knowledge and Data Engineering, vol. 15, No. 2, IEEE, March/April 2003.
[17] Lee, D.L.; Huei Chuang; Seamons, K., "Document ranking and the vector-space model", Software, IEEE Volume 14, Issue 2, Mar/Apr 1997 PP.67-75.
[18] Silva, I.R.; Souza, J.N.; Santos, K.S., "Dependence among terms in vector space model", Database Engineering and Applications Symposium, 2004. IDEAS '04. Proceedings. International 7-9 July 2004, pp. 97-102.
[19] Claudia Leacock and Martin Chodorow. 1998. Combining local context and WordNet similarity for word sense ident-fication. In Fellbaum 1998, pp.265-283.
[20] Zhibiao Wu and Martha Palmer. Verb, semantics and lexical selection. In proceedings of the 32nd Annual meeting of the Assocation for Computational Linguistics, Las Cruces, New Mexicok, Pages 133-138, June 1994.
[21] Stephen J.H. YANG and Norman W.Y. SHAO, “An Ontology Based Content Model for Intelligent Web Content Access Services”, Submit to Journal of Web Services Research, March 2005.
[22] J. Lee, M. Kim and Y. Lee, “Information Retrieval based on conceptual Distance” in IS-A Hierarchies, “J.Documentation”,vol.49,PP.188-207.
[23] Comparisons of similarity metrics http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#compare
[24] Johan Natt och Dag, Björn Regnell, “Evaluating Automated Support for Requirements Similarity Analysis in Market-Driven Development”
[25] Ding, Chris H.Q. 2000. A probabilistic model for dimensionality reduction in information retrieval and filtering. In Proc. of 1st SIAM Computational Information Retrieval Workshop, Raleigh, NC.
[26] Wenlei Mao, MS and Wesley W. Chu, PhD, "Free-text Medical Document Retrieval Via
Phrase-based Vector Space Model", Computer Science Department, University of California, Los Angeles.
[27] G. Miller, r.Beckwith, C.Fellbaum, D. Gross and K. Miller, “Introduction to WordNet:An On-Line Lexical Database,” International Journal of Lexicography. Vol 3, No. 4, 1990, pp.236-244.
[28] Ilmerio R. Silva, Joao, Nunes Souza and Karina S. Santos,“Dependence Among Terms in Vector Space Model”, Proceedings of the International Database Engineering and Applications Symposium, IEEE, 2004.
[29] Hideyuki UCHIDA Atsushi MANO and Takashi YUKAWA,“Patent Map Generation using Concept-based Vector Space Model”,working notes of NTCIR-4, Tokyo,2-4 June 2004
[30] Gary H. Merrill, “The Babylon Project: Toward an Extensible Text-Mining Platform”, IT Pro, IEEE Computer Society, March | April 2003.

指導教授

楊鎮華(Stephen J.H. Yang)

審核日期

2005-7-20

推文