摘要: | 研究期間:10108~10207;In the application of relevance feedback, query expansion has been an important approach in the enhancement of information retrieval. There are mainly two models employed in this approach of study: the vector model and the probability model. The application of relevance feedback for the two models, fundamentally, has been based on the classification of term appearance situations as appearing in the relevant documents and appearing in the irrelevant documents. With regard to the information of term appearance situations in relevance feedback, our earlier study (Chou & Chang, 2009) had proposed a more detailed differentiation of it as follows:(1) A term can appear in relevant documents only (termed astas1), (2) A term can appear in irrelevant documents only (termed as tas2),(3) A term can appear in both relevant and irrelevant documents (termed as tas3).The aiming of this classification of term appearance situations in our study was to present the appearance deviation of a term first, and then study the applicability of it in the formulation of a query. With this study, we had disclosed the application potential of the information of term appearance deviations. In this research project, our interest is directed to an advanced study about the extraction and the application of theinformation of term appearance deviations. We deduced thatterms oftas1could appear more frequently in the relevant documents and less frequently in the irrelevant documents; and to the opposite, terms oftas2 could appear more frequently in the irrelevant documents and less frequently in the relevant documents. With this deduction, it is reasonable to expect that the relevant documents could contain moretermsoftas1 and the irrelevant documents could contain more terms oftas2. We have examined some samples of retrieved documents for the mentioned term containing tendency. The results of the observation show that the deduction has been confirmed. In concluding the pre-studies of the above,a series of research questionshave been generated:“How to extract, from relevance feedback, the terms that will present applicable appearance deviations in the further retrieved documents? How to deal with term appearance deviations in the retrieved documents in the enhancement of information retrieval? Are the solutions developed for the above two questions applicable to the real world situation?” To engage in the series of studies, this research is interested in the conducting of a three year project. The first year project aims todevelop and test a method/algorithm to extract, from relevance feedback, the terms that will appear in deviation in the further retrieved documents. The second year project aims todevelop and test a method/algorithm that could apply the information of term appearance deviation in the enhancement of information retrieval. The third year project aims to developan information retrieval system to demonstrate the realization of the methods/algorithms developed in the previous sub-studies in the real world environment and conduct some experiments on the using of the developed information retrieval system to verify the capability of the proposed methods/algorithms in the real life usage. Importance of the study is on the disclosure of a piece of information of great potential for application in the research field of relevance feedback, and the initiation of a new approach in the application of the information contained in relevance feedback. |