中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/13552
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38255004      Online Users : 699
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/13552


    Title: 以複合名詞為基礎之文件概念建立方式;The construction of document concept based on compound nouns
    Authors: 施儒淵;Ju-yuan Shih
    Contributors: 資訊管理研究所
    Keywords: 資訊檢索;概念擷取;TF-IDF;複合名詞;向量空間模型;Information Retrieval;TF-IDF;Vector Space Model;compound nouns;Concept Extraction
    Date: 2009-06-15
    Issue Date: 2009-09-22 15:34:45 (UTC+8)
    Publisher: 國立中央大學圖書館
    Abstract: 由於資訊科技的進步,數位化資料與文件的數量呈現倍數成長,若是沒有資訊科技來協助使用者進行文件的搜尋,找尋文件勢必成為使用者的重擔。因此,為了可以減輕使用者在找尋文件時的負擔,利用電腦系統自動辨別文件是一項不錯的選擇,而電腦系統要自動辨別文件,常以文件之間的相似度做為分辨基準。 資訊檢索(Information Retrieval, IR)領域中,有不少的研究運用TF-IDF來表示字彙(Term)權重,並以這些字彙建立向量空間模型(Vector Space Model)來進行文件相似度計算。但是,在現實社會之中,我們常使用複合名詞,所以,表示時以字彙為單位,可能無法代表文件中的複合名詞;另外,在文件中也常見到多個字是表達同一個概念,所以,用字彙來代表文件可能會造成,描述相同概念的文件,卻因為用字不同而被辨別為不相關的文件。 本研究提出運用複合名詞(Compound Nouns)進行概念擷取(Concept Extraction),且以概念為維度的向量空間模型來進行文件相似度計算。首先,將文件中的複合名詞找出來,並以字彙和複合名詞為單位來進行概念擷取,再以所擷取出來的概念為維度產生向量空間模型,接著進行文件相似度比對。最後,本研究進行實驗,驗證出以概念為維度的向量空間模型,在文件相似度比較的精確度上,優於以TF-IDF字彙為維度的向量空間模型。 With the growth of information technology, a large volume of digital documents and materials has appeared. Without information technology, searching of information would require a great human effort. To decrease the users’ effort, documents discrimination system has been developed and applied. In this kind of system, documents usually are discriminated by similarities automatically. In Information Retrieval, researches mainly use TF-IDF to present terms from documents, exploit those terms to form Vector Space Model, and then compute documents similarity based on the formed Vector Space Model. This approach could be improved. First, in addition to single terms, compound nouns are used in documents also. Second, different terms are used in the presentation of the same concept. This paper has proposed a method which forms the Vector Space Model with concepts that are exacted from documents. The steps include, first, extracting concept from terms and compound nouns of the documents, and second, building a Vector Space Model with these concepts as dimensions. Experimental results show that the approach of concept extraction outperforms TF-IDF in accuracy of document similarity computing.
    Appears in Collections:[Graduate Institute of Information Management] Electronic Thesis & Dissertation

    Files in This Item:

    File SizeFormat


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明