摘要: | 在能輕易存取的公共發佈媒體,資訊過載(information overload) 是一個嚴重的問題。萃取與應用第一次檢索之文章內的資訊,來增進文件的相關排行(relevance ranking), 是一個重要的研究議題。過去研究,主要應用查詢擴張(query expansion) 的方式來解決這個相關排行的問題。近代許多研究,也利用其他的技術來解決這個問題,例如:向量空間(document vector)的建立,文件分類(document classification),與資訊過濾(information filtering)。探究過去研究對於包含在已檢索文件中之資訊的應用,顯示某些資訊,應能被更進一步的利用,來增進文件的相關排行。對於包含在已檢索文件集合中的資訊,我們關注字詞(term)間上下位(hierarchically-related)關係的應用。早期研究多專注在字詞的平面 (flat)關係,像是同義字(synonym),共同出現(co-occurrence),或是關聯性(association)。在本研究中,我們提出字詞的上下位關係,也應該被考慮進文件向量的建構中。對於包含在相關回饋之文件中的資訊,我們關注字詞在這些文件中偏差出現之字詞特性的應用。在本研究中,我們有興趣於字詞偏差出現之字詞特性的清楚辨析的功用,依據理論推演與對真實資料的觀察,我們認為詞偏差出現之字詞特性,應能被應用來處理已檢索文件的向量,以增進已檢索文件之相關排行的正確性。基於先前的論點, 本研究旨在建立一個資訊檢索系統(Information Retrieval System),其能讓檢索文章之相關排行更為正確。為了達到這個目標,本研究分割成三個子研究。第一年的子研究將會發展與評估一個方法/演算法,這個方法/演算法將從已檢索文章中,萃取字詞間上下位與平面的關係,並且應用這個關係來讓文件與查訊 (query)向量的建構更為妥切。第二年的子研究,在基於第一年子研究的向量建構方式上,將發展與評估一個方法/演算法,這個方法/演算法將應用字詞之偏差出現之字詞特性,來增進已檢索文件之相關排行的正確性。第三年的子研究,首先將發展一個資訊檢索系統,來展示前兩年研究所發展之方法/演算法的實用性,然後進行正式的實驗,來驗證所提出之方法/演算法在真實生活使用的成效。本研究之重要性包括:(1) 對一些先前未曾利用之檢索文件中的資訊,本研究將展示這些資訊的可應用性,這將揭露這些資訊可被應用的潛能。(2)本研究將發展新的方法 /演算法,其係透過對檢索文件的處理,來改善已檢索文件之相關排行。這些方法/演算法,將能與傳統的查詢擴張方法整合,在查詢擴張的技術外,更進一步增進資訊檢索的效率。 Information overload is a serious problem on easily accessed publishing channels. An important study issue is the extraction and application of the information contained in the first retrieved documents in the enhancement of documents’ relevance ranking.Studies mainly take the approach of query expansion in the solving of this relevance ranking problem. Some recent studies also exploit other techniques in the solving of the problem, such as vector space creation, document classification, and information filtering. Examination of the application of the information contained in the retrieved documents in past studies reveals that some information could be further applied in the enhancement of documents’ relevance ranking.For the information contained in the set of documents retrieved, we are concerned with the application of the hierarchically-related relationship of terms. Most previous techniques focus on the flat relationship of terms, such as synonym, co-occurrence, or association. In this study, we propose that the hierarchically-related relationship of terms also needs to be considered in the construction of vectors. For the information contained in the documents of relevance feedback, we are concerned with the application of term appearance deviation in the documents. In this study, our interest is directed to a more detailed differentiation of term appearance deviation. Based on theoretical deduction and observation of real world data, we propose that an individual term’s appearance deviation in the documents of relevance feedback could be identified and applied to deal with the retrieved documents to enhance documents’relevance ranking. Based on the above propositions, this study is aimed to construct an IR (Information Retrieval) system to enhance relevance ranking for the retrieved documents. To attain this goal, this study is divided into three sub-studies. The first year sub-study will develop and evaluate a method/algorithm that could extract both the hierarchically-related and the flat relationships of terms from the retrieved documents and apply it to the construction of document and query vectors. The second year sub-study, based on the method of vector construction of the first year, will develop and evaluate a method/algorithm that could apply the information of term appearance deviation to enhance relevance ranking for the retrieved document. The third year sub-study will first develop an IR system to demonstrate the realization of the methods/algorithms developed in previous sub-studies, then conduct formal experiments to verify the effectiveness of the proposed methods/algorithms in real life usage. Importance of this study includes the following: First, this study would demonstrate the application of some information which is contained in the retrieved documents but not utilized before. This would disclose the potential for further study and development of the information. Second, this study would develop methods/algorithms to improve the relevance ranking for the retrieved documents by dealing with the retrieved documents. These methods/algorithms could be integrated with the conventional query expansion methods that mainly deal with the query in the enhancement of the curling of information 研究期間:10008 ~ 10107 |