博碩士論文 93423039 詳細資訊


姓名 施亮如(Liang-Lu Shih)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 引用本體論至相關文件檢索之研究
(Applying Ontology to Relevant Document Discovery)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 相關文件檢索的議題已被廣泛地討論,並有各種不同的方法或技術被提出或實
際應用至上線的文件檢索系統中。大部分的方法採取讓使用者輸入查詢,系統對
查詢字串做些處理,再進行全文比對以找到相關文件;或者,提供使用者特定欄
位的查詢,如標題、摘要、關鍵字、參考文獻等,再將這些特定欄位轉成特定的
模式做相似度計算,如向量模式搭配TF/IDF 來計算文章相似度。整體而言,這
些方法主要來自於資訊檢索(Information Retrieval)這門領域中。
語意網(Semantic Web)是一門新興的研究領域,並已被用來和其他研究領域相
結合以產生各種應用,這些領域包括知識管理、代理人通訊、網路服務等。語意
網的核心概念為本體論(Ontology),根據本體論的特性,以標籤語言方式將特定
內容具備的語意充份地呈現出來,不但具可讀性,更能被電腦系統作進一步的處
理;而目前大多提出的相關文件檢索的方法對於文件內容中語意特性的處理仍然
有限,再加上較少文獻論及將本體論的概念應用至相關文件檢索的方法,因此促
成本研究的產生。
於本研究中,將本體論應用至相關文件檢索的架構被設計出來,並實作一個雛
型系統。系統的輸入為一份文件,而輸出為和輸入文件相關的文件;而系統處理
程序主要分成若干步驟:(1)將輸入文件轉換成本體論的格式。(2)若輸入文件已
存在於系統中,則直接輸出相關文件。(3)若輸入文件不存在於系統中,則進行
輸入文件和已存在於系統中文件的相似度計算。其中,本研究設計兩種相似度計
算方法來計算相似度,並搭配遺傳演算法來分別計算兩種相似度計算結果所對應
的權重,完成最終的相似值。
摘要(英) Research of relevant document discovery is practical and attractive to many
researchers, and there are different solutions to this issue. Some solutions have been
adopted in real world environments, such as electronic articles publishers. These
publishers offer different information search options such as keywords, full-text,
phrase, boolean expression…etc, for users to retrieve documents. Most relevant
document discovery techniques are originally from the domain of information
retrieval. The core concept of semantic web is ontology, which has been applied in
various domains, such as web service, agent communication, knowledge
management… etc. However, there was few paper applied ontology to the research of
relevant document discovery. Therefore, in this paper, ontology is applied to the issue
of relevant documents discovery and a prototype system is constructed to implement
the method proposed. With the input of a user selected document, the designed
prototype system could generate a number of closely related documents that originally
stored in the repository. The process of the prototype system could be mainly divided
into the following steps: (1) transforming the input text document into OWL format (2)
determining if the input document already exists in the ontology repository of the
system (3) if the input document does not exist in ontology repository, then the
program will calculate the similarity between the input ontology and the documents
originally stored in ontology repository, and retrieving related documents with higher
similarity values. Ontology extraction and similarity calculation are the cores that
applied the concept of ontology to the prototype system. The objective of ontology
extraction is to transform TXT format documents into OWL formats according to the
characteristics of ontology. Secondly, similarity calculation is composed of two
methods: concept similarity and instance similarity are proposed and implemented in
the prototype system.
關鍵字(中) ★ 相關文件檢索
★ 本體論萃取
★ 本體論對應
★ 本體論
關鍵字(英) ★ Relevant Document Discovery
★ Ontology Extraction
★ Ontology
論文目次 1. Introduction................................1
1.1 Research Background .......................1
1.2 Research Motivation .......................2
1.3 Purpose....................................3
2. Literature Review ..........................5
2.1 OWL Ontology ..............................5
2.2 Ontology Extraction........................7
2.3 Similarity Calculation ....................9
3. Method of Relevant Document Discovery .....12
3.1 SystemArchitecture ........................12
3.2 Ontology Extraction........................14
3.2.1 Preprocess...............................15
3.2.2 Find the Associated Content of Schema........16
3.2.3 Extract Instances from Content................17
3.2.4 Constructing Ontology........................20
3.3 Similarity Calculation .........................21
3.3.1 Definition of Similarity Calculation............21
3.3.2 Similarity Method 1: Concept Similarity ............22
3.3.3 Similarity Method 2: Instance Similarity..............24
3.3.4 Operational Definition of Instance Similarity ...........26
3.3.5 Weights of Similarity Measures...........................27
4. Implementation and Evaluation ...............................29
4.1 Implementation Tools and Environment........................29
4.2 Evaluation of Ontology Extraction ..........................29
4.2.1 Implement Sentences as Instances ..........................30
4.2.2 Implement multi-words as Instances.........................32
4.3 Evaluation of the Prototype System...........................34
4.3.1 Evaluation Method..........................................34
4.3.2 Experiment 1: only Concept Similarity......................36
4.3.3 Experiment 2: only Instance Similarity.....................36
4.3.4 Experiment 3: Concept and Instance Similarity .................38
5. Conclusion and Future Direction ...............................41
5.1 Conclusion ..................................................41
5.2 Contribution ...............................................41
5.3 Limitation.................................................41
5.4 Future Direction ..........................................42
References ....................................................44
參考文獻 1. Alani, H., Kim, S., Millard, D. E., Weal, M. J., Hall, W., Lewis, P. H. and Shadbolt,
N. R., Automatic Ontology-Based Knowledge Extraction from Web Documents,
IEEE Intelligent Systems, Vol. 18, No.1, pp.14-21, 2003.
2. Baeza-Yates, R., Ribeiro-Neto,B., 1999. Modern Information Retrieval, New York:
Addison-Wesley.
3. Baziz, M., Boughanem, M., Aussenac-Gilles,N., Chrisment,C., Semantic Cores for
Representing Documents in IR, Proceedings of the 2005 ACM symposium on
Applied computing SAC '05 , pp.1011-1017, 2005.
4. Berners-Lee Tim, Hendler James, Lassila Ora, THE SEMANTIC WEB,
SCIENTIFIC AMERICAN, Vol. 284, Issue 5, pp. 34-44, 2001.
5. Carmen Costilla, Juan P. Palacios, María José Rodríguez, José Cremades, Antonio
Calleja, Raúl Fernández, Jorge Vila, Semantic Web Digital Archive Integration,
DEXA Workshops 2004, pp. 179-185, 2004.
6. Doan, A., Jayant, M., Pedro, D., Alon, H., “Learning to map between ontologies on
the semantic web”, Proceedings of the Eleventh International WWW Conference,
2002.
7. Ehrig M., Haase P., Hefke M., Stojanovic N., “Similarity for Ontologies - a
Comprehensive Framework, 13th European Conference on Information Systems,
2005.
8. Ehrig M., Staab S., QOM - Quick Ontology Mapping, Proceedings of the Third
International SemanticWeb Conference, pp. 683-696 , 2004.
9. Ehrig M., Sure Y., Ontology Mapping - An Integrated Approach, Proceedings of the
1st European Semantic Web Symposium, pp. 76-91, 2004.
10. Goldberg D.E., 1989, Genetic Algorithms in Search, Optimization, and Machine
Learning, ADDISON-WESLEY
45
11. Golgher, P.B., Laender, A.H.F., Lage, J.P., e Silva, A.S , Automatic generation of
agents for collecting hidden web pages for data extraction, Data & Knowledge
Engineering, Vol.19, Issue2, pp. 177-196, 2004.
12. Hotho, A., Staab, S. Maedche A., Ontology-based Text Clustering, Workshop
"Text Learning: Beyond Supervision", 2001.
13. Ian H.Witten, Eibe Frank, 1999, Data Mining-Practical Machine Learning Tools
and Techniques with Java Implementations, the Morgan Kaufmann Series in Data
Management Systems.
14. Kalfoglou, Y., Schorlemmer, M., Ontology Mapping: The State of the Art, the
Knowledge Engineering Review, Vol. 18, No.1, pp. 1-31, 2003.
15.Kenneth P. Bogart, 1990, Introductory Combinatorics, Harcourt Brace Jovanovich.
16. Kietz, J.U., Maedche A., Volz,R., A Method for Semi-Automatic Ontology
Acquisition from a Corporate Intranet”, proc. of Workshop Ontologies and Text,
co-located with the 12th International Workshop on Knowledge Engineering and
Knowledge Management, 2000.
17. Krishnamurthy V., 1986, COMBINATORICS-theory and applications, Ellis
Horwood.
18. Maedche, A., Motik, B., Stojanovic, L., Studer, R., Volz, R., Ontologies for
Enterprise Knowledge Management, Intelligent Systems, IEEE, Vol. 18 , Issue 2,
pp. 22-33, 2003.
19. Maedche, A., Staab, S., Ontology Learning for the Semantic Web, IEEE
INTELLIGENT SYSTEMS, Vol. 16, Issue 2, pp. 72-79, 2001.
20. Mitra P., Noy N,F., Jaiswal A.R., OMEN: A Probabilistic Ontology Mapping Tool,
International SemanticWeb Conference, pp. 537-547, 2005.
21. Mitchell, T.M., 1997, MACHINE LEARNING”, McGraw-Hill.
22. Natalya F. Noy, Mark A. Musen, The PROMPT Suite: Interactive Tools For
Ontology Merging And Mapping, International Journal of Human-Computer
Studies, pp. 983-1024, 2003.
46
23. Rodriguez, M.A., Egenhofer, M.J., Determining semantic similarity among entity
classes from different ontologies, IEEE Transactions on Knowledge and Data
Engineering, Vol.15, Issue 2, pp. 442-456, 2003.
24. Schlobach, S., Assertional Mining in Description Logics, Description Logics,
pp.237-246, 2000.
25. Sridharan, B., Tretiakov, A., Kinshuk, Application of Ontology to Knowledge
management in Web based Learning, IEEE International Conference, pp.
663-665, 2004.
26. Tan, K.W., Han, H., Elmasri, R., “Web Data Cleansing and Preparation for
Ontology Extraction using WordNet”,Proceedings of the First International
Conference, Vol. 2, pp. 11-18,2000.
27. Williams, A.B., Tsatsoulis, C., An Instance-based Approach for Identifying
Candidate Ontology Relations within a Multi-Agent System, Fourteenth
European Conference on Artificial Intelligence, Ontology Learning ECAI-2000
Workshop, Berlin, 2000.
28. http://infomesh.net/2001/swintro/
29. http://protege.stanford.edu/plugins/owl/documentation.html
30. http://scholar.google.com/
31. http://wordnet.princeton.edu/
32. http://www.daml.org/
33. http:// www.google.com/
34. http://www.pdfbox.org/index.html
35. http://www.seas.gwu.edu/~simhaweb/software/jwordnet/
36. http://www.w3.org/RDF/
37. http://www.w3.org/2004/OWL/
指導教授 鄭裕勤(Eric Y. Cheng) 審核日期 2006-7-12
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡