利用語意分析於相關回饋以進行查詢擴展之方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：42

、訪客IP：3.142.196.98

姓名

孫智梁(Zhi-Liang Sun) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

利用語意分析於相關回饋以進行查詢擴展之方法
(The application of semantic analysis in relevance feedback for query expansion)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

現今我們處於資訊爆炸的時代，在面臨龐大資料量時，如何有效率地獲取所需資訊是一個非常重要的課題，而資訊檢索 (Information Retrieval) 系統也就成為人們在篩選資料時最常用的工具之一。在相關回饋 (Relevance Feedback) 領域中，Rocchio演算法最廣為人知，該演算法藉由分析相關文件字詞及非相關文件字詞出現頻率，來產生新的查詢字詞，並加入到查詢擴展 (Query Expansion) 集合中，不過Rocchio僅以頻率之角度判斷，並未考量字詞間其他可以利用的資訊。近年來陸續也有語意搜索的研究被提出，概念為發掘字詞間隱含的語意關係，因此，本研究以使用者的原始查詢和查詢結果作為基礎，主要利用神經網路模型Word2Vec來分析原始查詢以及相關回饋中字詞間的語意資訊，並結合共現性分析，萃取出適合的相關字詞來擴展原始查詢字詞集合，使查詢關鍵字能夠更貼近使用者需求。最後透過實驗證明，本研究所提出之方法相較於其他方法能有較佳的檢索效果。

摘要(英)

In an era of information explosion, to obtain the information efficiently is a very important issue when faced with huge data volume, and the information retrieval system has become one of the most commonly used tools. In the field of relevance feedback, Rocchio’s query expansion is a well-known method. The algorithm generates new query terms by analyzing the frequency of terms which residing in relevance documents and non-relevance documents. However, Rocchio’s method only focuses on term frequency and ignores information between terms. In recent years, the idea of semantic search is getting more and more popular. Therefore, based on the user′s original query and search results, our research uses Word2Vec which is a neural network model to analyze the semantic information between the original query and the relevance feedback, and combine the co-occurrence analysis to extract the appropriate query expansion terms. The results of experiments verify that the proposed method is effective in document retrieval.

關鍵字(中)

★ 資訊檢索
★ 相關回饋
★ 查詢擴展
★ 語意分析
★ Word2Vec

關鍵字(英)

★ Information Retrieval
★ Relevance Feedback
★ Query Expansion
★ Semantic Analysis
★ Word2Vec

論文目次

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
一、緒論 1
1-1 研究背景與動機 1
1-2 研究目的 2
1-3 研究範圍及限制 2
1-4 論文架構 3
二、文獻探討 4
2-1 相關回饋 (Relevance Feedback) 4
2-1-1相關回饋背景與應用 4
2-1-2 Rocchio演算法 6
2-2 查詢擴展 (Query Expansion) 8
2-2-1 局部查詢擴展 (Local Query Expansion) 9
2-2-2 全域查詢擴展 (Global Query Expansion) 9
2-3 正規化Google距離 (Normalized Google Distance) 10
2-4 語詞資訊應用方法 11
2-5 Word2Vec 12
三、研究方法 15
3-1 系統架構 15
3-2 方法設計 16
3-2-1原始查詢結果處理 17
3-2-2相關字詞之間的語意資訊處理 17
3-2-3原始查詢字詞語意資訊處理 18
3-2-4相關字詞之間的共現分析處理 20
四、實驗設計 22
4-1 實驗資料 22
4-2 實驗評估指標 26
4-3 實驗流程 29
4-3-1 實驗一 30
4-3-2 實驗二 31
4-4 實驗結果 32
4-4-1 實驗一結果 32
4-4-2 實驗二結果 39
4-5 實驗結果討論 47
五、結論 50
5-1 結論與貢獻 50
5-2 未來研究方向 51
參考文獻 52

參考文獻

[1] Furnas, G.W., Landauer, T.K., Gomez, L.M., and Dumais, S.T. (1987). The vocabulary problem in human-system communication. Communications of the ACM, 30(11): p. 964-971.
[2] Salton, G. and McGill, M.J. (1983). Introduction to modern information retrieval.
[3] Rocchio, J.J. (1971). Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing: p. 313-323.
[4] Lin, Y.-S. (2015). The application of the term information residing in relevance feedback for query expansion (Master′s thesis). National Central University
[5] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[6] Bhogal, J., MacFarlane, A., and Smith, P. (2007). A review of ontology based query expansion. Information processing & management, 43(4): p. 866-886.
[7] Salton, G. (1971). The SMART retrieval system—experiments in automatic document processing.
[8] Dillon, M. and Desper, J. (1980). The use of automatic relevance feedback in Boolean retrieval systems. Journal of Documentation, 36(3): p. 197-208.
[9] Robertson, S.E., van Rijsbergen, C.J., and Porter, M.F. (1980). Probabilistic models of indexing and searching. in Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. Butterworth & Co.
[10] Rui, Y., Huang, T.S., Ortega, M., and Mehrotra, S. (1998). Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on circuits and systems for video technology, 8(5): p. 644-655.
[11] Buckley, C. and Salton, G. (1995). Optimization of relevance feedback weights. in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM.
[12] Grigorova, A., De Natale, F.G., Dagli, C., and Huang, T.S. (2007). Content-based image retrieval by feature adaptation and relevance feedback. IEEE transactions on multimedia, 9(6): p. 1183-1192.
[13] Yan, R., Hauptmann, A., and Jin, R. (2003). Multimedia search with pseudo-relevance feedback. in International Conference on Image and Video Retrieval. Springer.
[14] Kelly, D. and Belkin, N.J. (2001). Reading time, scrolling and interaction: exploring implicit sources of user preferences for relevance feedback. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM.
[15] Manning, C.D., Raghavan, P., and Schutze, H. (2008). Introduction to information retrieval. Vol. 1. 2008: Cambridge university press Cambridge.
[16] Vechtomova, O. and Wang, Y. (2006). A study of the effect of term proximity on query expansion. Journal of Information Science, 32(4): p. 324-333.
[17] Pinto, F.J. and Perez-Sanjulian, C.F. (2008). Automatic query expansion and word sense disambiguation with long and short queries using WordNet under vector model. Actas de los Talleres de las Jornadas de Ingenieria del Software y Bases de Datos, 2(2): p. 17-23.
[18] Shi, Z., Gu, B., Popowich, F., and Sarkar, A. (2005). Synonym-based query expansion and boosting-based re-ranking: A two-phase approach for genomic information retrieval. in the Fourteenth Text REtrieval Conference (TREC 2005), NIST, Gaithersburg, MD.(October 2005).
[19] Araujo, L. and Perez-Aguera, J.R. (2008). Improving query expansion with stemming terms: a new genetic algorithm approach. in European Conference on Evolutionary Computation in Combinatorial Optimization. Springer.
[20] Chen, Q., Li, M., and Zhou, M. (2007). Improving query spelling correction using web search results. in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).
[21] Harman, D. (1992). Relevance feedback revisited. in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. ACM.
[22] Sihvonen, A. and Vakkari, P. (2004). Subject knowledge improves interactive query expansion assisted by a thesaurus. Journal of Documentation, 60(6): p. 673-690.
[23] Xu, J. and Croft, W.B. (1996). Query expansion using local and global document analysis. in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. ACM.
[24] Crouch, C.J. (1990). An approach to the automatic construction of global thesauri. Information Processing & Management, 26(5): p. 629-640.
[25] Cilibrasi, R.L. and Vitanyi, P.M. (2007). The google similarity distance. IEEE Transactions on knowledge and data engineering, 19(3).
[26] Wu, I.-C., Lin, Y.-S., and Liu, C.-H. (2011). An exploratory study of navigating wikipedia semantically: model and application. in International Conference on Online Communities and Social Computing. Springer.
[27] WorldWideWebSize. The size of the World Wide Web (The Internet). 2018 [cited 2018 30 June]; Available from: www.worldwidewebsize.com.
[28] Evangelista, A. and Kjos-Hanssen, B. (2009). Google distance between words. Frontiers in Undergraduate Research.
[29] Chen, P.-I. and Lin, S.-J. (2010). Automatic keyword prediction using Google similarity distance. Expert Systems with Applications, 37(3): p. 1928-1938.
[30] Handler, A. (2014). An empirical study of semantic similarity in WordNet and Word2Vec.
[31] Wikipedia. Wikipedia:Database download. 2018 [cited 2018 31 Mar]; Available from: https://en.wikipedia.org/wiki/Wikipedia:Database_download.
[32] Salton, G. and Lesk, M.E. (1968). Computer evaluation of indexing and text processing. Journal of the ACM (JACM), 15(1): p. 8-36.
[33] Chiang, Y.-T. and Chen, K.-H. (1999). The TREC and Its Impact on IR Researches. Journal of Library and Information Studies, (29): p. 36-59.
[34] Potts, K. (2007). Web design and marketing solutions for business websites. 2007: Apress.
[35] Davis, J. and Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. in Proceedings of the 23rd international conference on Machine learning. ACM.
[36] Everingham, M., Van Gool, L., Williams, C.K., Winn, J., and Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2): p. 303-338.
[37] Zhu, M. (2004). Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, 2: p. 30.
[38] Turpin, A. and Scholer, F. (2006). User performance versus precision measures for simple search tasks. in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2018-7-30

推文