應用查詢擴展字詞及原始查詢字詞之語意資訊於文件重排序之方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：95

、訪客IP：18.226.165.123

姓名

蔡丞祐(Cheng-You Tsai) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

應用查詢擴展字詞及原始查詢字詞之語意資訊於文件重排序之方法
(The application of the semantic information of terms residing in query expansion and original query for document re-ranking)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來隨著網路的發展，使用者可以透過資訊檢索快速取得資訊，雖然資訊取得已變得容易，但如何更精確、有效率地讓使用者獲取所需資訊是重要的議題之一。而在相關回饋領域中，以Rocchio演算法最廣泛被應用，其分析相關與非相關文件出現頻率來產生新的查詢字詞，但Rocchio僅以字詞出現頻率作為依據，並未考量到其他字詞間的語意資訊。近年來有許多基於語意相關的研究被提出，其概念為挖掘字詞之間更深層的語意關係，因此本研究將以使用者的原始查詢以及相關回饋作為基礎，利用Word2Vec計算查詢擴展字詞間之語意相似度並萃取其概念資訊，再透過原始查詢字詞與查詢擴展之概念所隱含之語意關係計算出概念重要性，最後計算出文件與查詢擴展之概念匹配程度，用以重新排序查詢擴展之檢索結果。最後透過實驗證實，本研究所提出之方法於前五篇及前十篇準確率相較於Rocchio演算法能提升30%以及32%之效能，相較於Cai提出之方法能再提升9%與4%之效能。

摘要(英)

In recent years, with the development of the Internet, users can quickly obtain information through information retrieval. But how to obtain the required information more accurately and efficiently is one of the important issues. In the field of relevance feedback, Rocchio′s query expansion is most widely used. The algorithm generates new query terms by analyzing the frequency of terms which residing in relevance and non-relevance document. However, Rocchio′s method only utilize the term frequency, and doesn′t concern semantic information between terms. Recently, the idea of semantic related study had been proposed, the concept of which is to explore the deeper semantic information between terms. Therefore, based on the user′s original query and relevance feedback, our study utilizes Word2Vec to analyze the semantic information and extract the concept of query expansion by using clustering algorithm, then calculate the concept importance through semantic information between terms of original query and query expansion. Finally, using concept score for document re-ranking by calculates concepts score between query expansions and documents. The result of experiments verify that the study is effective in document retrieval.

關鍵字(中)

★ 資訊檢索
★ 相關回饋
★ 查詢擴展
★ Word2Vec
★ 文件重排序

關鍵字(英)

★ Information Retrieval
★ Relevance Feedback
★ Query Expansion
★ Word2Vec
★ Document Re-ranking

論文目次

中文摘要 i
英文摘要 ii
目錄 iv
圖目錄 vi
表目錄 viii
一、緒論 1
1-1 研究背景與動機 1
1-2 研究目的 2
1-3 研究範圍與限制 2
1-4 論文架構 3
二、文獻探討 4
2-1 相關回饋 4
2-1-1 相關回饋背景與應用 4
2-1-2 Rocchio演算法 6
2-2 查詢擴展 7
2-2-1 局部查詢擴展 9
2-2-2 全域查詢擴展 9
2-3 分群相關研究 10
2-3-1 K-Means 10
2-3-2 Affinity Propagation 11
2-3-3 字詞語意分群 13
2-4 查詢擴展字詞資訊應用方法 13
2-5 詞嵌入 15
2-5-1 Word2Vec 15
2-5-2 CBOW 15
2-5-3 Skip-Gram 16
三、研究方法 18
3-1 系統架構 18
3-2 方法設計 19
3-2-1 原始查詢結果處理 20
3-2-2 查詢擴展字詞之語意相似度分群處理 20
3-2-3 概念重要性處理 21
3-2-4 文件重排序處理 24
四、實驗設計 28
4-1 實驗資料 28
4-2 實驗評估指標 30
4-3 實驗參數設計 34
4-3-1 Rocchio演算法之參數設定 34
4-3-2 概念之數量設定 35
4-3-3 重排序之參數設定 35
4-4 實驗流程 37
4-4-1 實驗一之流程 38
4-4-2 實驗二之流程 38
4-5 實驗結果 38
4-5-1 實驗一結果 38
4-5-2 實驗二結果 47
4-6 實驗結果討論 54
五、結論 55
5-1 結論與貢獻 55
5-2 未來研究方向 56
參考文獻 57

參考文獻

[1] G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, "The vocabulary problem in human-system communication," Communications of the ACM, vol. 30, no. 11, pp. 964-971, 1987.
[2] G. Salton and M. J. McGill, Introduction to modern information retrieval. McGraw-Hill, 1983.
[3] J. J. Rocchio, "The SMART retrieval system: Experiments in automatic document processing," Relevance feedback in information retrieval, pp. 313-323, 1971.
[4] C.-S. Cai, "The application of the semantic information of terms residing in query expansion for document re-ranking," M.B.A thesis, National Central University, 2017.
[5] X. Hu, X. Zhang, C. Lu, E. K. Park, and X. Zhou, "Exploiting Wikipedia as external knowledge for document clustering," presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, 2009.
[6] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[7] J. Bhogal, A. MacFarlane, and P. Smith, "A review of ontology based query expansion," Information processing & management, vol. 43, no. 4, pp. 866-886, 2007.
[8] C. Buckley, "Optimization of relevance feedback weights," in Proc. SIGIR′95, 1995, pp. 351-357.
[9] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, "Relevance feedback: a power tool for interactive content-based image retrieval," IEEE Transactions on circuits and systems for video technology, vol. 8, no. 5, pp. 644-655, 1998.
[10] A. Grigorova, F. G. B. De Natale, C. Dagli, and T. S. Huang, "Content-based image retrieval by feature adaptation and relevance feedback," IEEE transactions on multimedia, vol. 9, no. 6, pp. 1183-1192, 2007.
[11] R. Yan, A. Hauptmann, and R. Jin, "Multimedia search with pseudo-relevance feedback," in International Conference on Image and Video Retrieval, 2003, pp. 238-247: Springer.
[12] Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, and Y. Pan, "A multimedia retrieval framework based on semi-supervised ranking and relevance feedback," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 723-742, 2011.
[13] G. Gay, S. Haiduc, A. Marcus, and T. Menzies, "On the use of relevance feedback in IR-based concept location," in 2009 IEEE International Conference on Software Maintenance, 2009, pp. 351-360: IEEE.
[14] D. Kelly and N. J. Belkin, "Reading time, scrolling and interaction: exploring implicit sources of user preferences for relevance feedback," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001, pp. 408-409: ACM.
[15] C. D. Manning, P. Raghavan, and H. Schütze, "Introduction to information retrieval," vol. 1Cambridge, England: Cambridge university press, 2008, July, pp. 178-189.
[16] O. Vechtomova and Y. Wang, "A study of the effect of term proximity on query expansion," Journal of Information Science, vol. 32, no. 4, pp. 324-333, 2006.
[17] F. J. Pinto and C. F. Pérez-Sanjulián, "Automatic query expansion and word sense disambiguation with long and short queries using WordNet under vector model," Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos, vol. 2, no. 2, pp. 17-23, 2008.
[18] Z. Shi, B. Gu, F. Popowich, and A. Sarkar, "Synonym-based query expansion and boosting-based re-ranking: A two-phase approach for genomic information retrieval," in the Fourteenth Text REtrieval Conference (TREC 2005), NIST, Gaithersburg, Maryland, USA, 2005.
[19] L. Araujo and J. R. Pérez-Agüera, "Improving query expansion with stemming terms: a new genetic algorithm approach," in EvoCOP′08 Proceedings of the 8th European conference on Evolutionary computation in combinatorial optimization, Naples, Italy, 2008, pp. 182-193.
[20] M. F. Porter, "An algorithm for suffix stripping," Program, vol. 14, no. 3, pp. 130-137, 1980.
[21] Q. Chen, M. Li, and M. Zhou, "Improving Query Spelling Correction Using Web Search Results," in Empirical Methods in Natural Language Processing Conference on Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 2007, vol. 7, pp. 181-189: Citeseer.
[22] Rocchio, "Relevance feedback in information retrieval. In: The SMART Retrieval System Experiments in Automatic Document Processing," G. Salton, Ed. New Jersey, USA: Prentice-Hall, 1971, pp. 313-323.
[23] D. Harman, "Relevance feedback revisited," in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, Copenhagen, Denmark, 1992, pp. 1-10: ACM.
[24] A. Sihvonen and P. Vakkari, "Subject knowledge improves interactive query expansion assisted by a thesaurus," Journal of Documentation, vol. 60, no. 6, pp. 673-690, 2004.
[25] J. Xu and W. B. Croft, "Query expansion using local and global document analysis," in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, Zurich, Switzerland, 1996, pp. 4-11: ACM.
[26] J. Xu and W. B. Croft, "Query expansion using local and global document analysis In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval," ed: ACM, 1996.
[27] C. J. Crouch, "An approach to the automatic construction of global thesauri," Information Processing & Management, vol. 26, no. 5, pp. 629-640, 1990.
[28] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, "An efficient k-means clustering algorithm: Analysis and implementation," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 7, pp. 881-892, 2002.
[29] B. J. Frey and D. Dueck, "Clustering by Passing Messages Between Data Points," Science, vol. 315, no. 5814, p. 972, 2007.
[30] Y. He, Q. Chen, X. Wang, R. Xu, X. Bai, and X. Meng, "An adaptive affinity propagation document clustering," in 2010 The 7th International Conference on Informatics and Systems (INFOS), 2010, pp. 1-7: IEEE.
[31] H.-C. Chang and C.-C. Hsu, "Using topic keyword clusters for automatic document clustering," IEICE TRANSACTIONS on Information and Systems, vol. 88, no. 8, pp. 1852-1860, 2005.
[32] T. Wei, Y. Lu, H. Chang, Q. Zhou, and X. Bao, "A semantic approach for text clustering using WordNet and lexical chains," Expert Systems with Applications, vol. 42, no. 4, pp. 2264-2275, 2015.
[33] L. Ma and Y. Zhang, "Using Word2Vec to process big text data," in 2015 IEEE International Conference on Big Data (Big Data), 2015, pp. 2895-2897.
[34] K. Skianis, F. Rousseau, and M. Vazirgiannis, "Regularizing text categorization with clusters of words," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1827-1837.
[35] G. E. Hinton, "Learning distributed representations of concepts," in Proceedings of the eighth annual conference of the cognitive science society, 1986, vol. 1, p. 12: Amherst, MA.
[36] A. Mnih and K. Kavukcuoglu, "Learning word embeddings efficiently with noise-contrastive estimation," in Advances in neural information processing systems, 2013, pp. 2265-2273.
[37] G. Salton and M. E. Lesk, "Computer evaluation of indexing and text processing," Journal of the ACM (JACM), vol. 15, no. 1, pp. 8-36, 1968.
[38] Wikipedia. (2018 ). Wikipedia:Database download. Available: http://en.wikipedia.org/wiki/Wikipedia:Database_download.
[39] E. D. Liddy, "Enhanced text retrieval using natural language processing," Bulletin of the American Society for Information Science and Technology, vol. 24, no. 4, pp. 14-16, 1998.
[40] K. Potts, Web design and marketing solutions for business websites. Apress, 2007.
[41] G. Salton and C. Buckley, "Improving retrieval performance by relevance feedback," Journal of the American society for information science, vol. 41, no. 4, pp. 288-297, 1990.
[42] Y.-Y. Lee, H. Ke, H.-H. Huang, and H.-H. Chen, "Combining word embedding and lexical database for semantic relatedness measurement," in Proceedings of the 25th International Conference Companion on World Wide Web, 2016, pp. 73-74: International World Wide Web Conferences Steering Committee.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2019-7-23

推文