社群論壇之問題檢索

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：97

、訪客IP：18.117.105.149

姓名

翁梓勝(Tzu-Sheng Weng) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

社群論壇之問題檢索
(Question Retrieval of Community Forum)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究	★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例	★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現
★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法	★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結
★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data	★ 藉由加入多重語音辨識結果來改善對話狀態追蹤
★ 對話系統應用於中文線上客服助理:以電信領域為例	★ 應用遞歸神經網路於適當的時機回答問題
★ 使用多任務學習改善使用者意圖分類	★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法
★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家	★ 使用YMCL模型改善使用者意圖分類成效

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

最近這幾年來，隨著網際網路 (World Wide Web) 的發展，社群問答的網站在最近這段時間也成長的非常多，大量的問答網站擁有非常多的資訊形成網路線上一個很有價值的知識寶庫，然而有一個現象，這些網站都會遇到的就是會有重複的問題,因此問題檢索的主要任務就是用來協助從存檔裡面找出之前已經被回答過的相關問題，然而詞語上同義詞性質的多樣性是問題檢索的一個極大挑戰，有些研究方法利用計算新的問題以及存檔問題之間相互關係的機率來處理這樣的狀況，另外也有許多研究是著重在字串之間的相似度。
在這篇論文裡，我們提出了一個方法首先利用 CBoW 的模型使用華碩 ROG 論壇的資料庫來做訓練資料，然後利用訓練出來的資料計算輸入的新問題以及存檔的問題之間的相似程度，與其他研究不同的地方在於我們將問題的標題以及問題的完整描述分開來看，將他們當作是兩個不同的特徵來做計算，另外我們也將使用者的榮譽點數拿來當做我們評估的一個要素，我們的實驗顯示，對 ROG 論壇的資料庫做出來的結果優於其他的方法。

摘要(英)

In recent years, there has been much development of community based question and answer (cQA) site. The number of large-scale Q&A sites has significantly increased over time, and the information on these sites represents a valuable online knowledge pool. However, one issue with such sites is the problem of duplicate questions. The task of question retrieval aims to find previously answered semantically similar questions in cQA archives. Nevertheless, synonymous lexical variations pose a big challenge for question retrieval. Some approaches address this issue by calculating the probability of correlation between new questions and archived questions. Much recent research has also focused on surface string similarity among questions.
In this paper, we propose a method that first builds a continuous bag-of-word (CBoW) model with data from Asus’s Republic of Gamers (ROG) forum and then determines the similarity between a given new question and the Q&As in our database. Unlike most other studies, we calculate the similarity between the given question and the archived questions and descriptions separately with two different features. In addition, we factor user reputation into our ranking model. Our experimental results on ROG forum dataset show that our CBoW model with reputation features outperforms other top methods.

關鍵字(中)

★ 社群
★ 論壇
★ 問題
★ 檢索

關鍵字(英)

★ Question
★ Retrieval
★ Community
★ Forum

論文目次

中文提要 ……………………………………………………………… i
Abstract ……………………………………………………………… ii
誌謝 ……………………………………………………………… iii
Contents ……………………………………………………………… iv
Figures ……………………………………………………………… v
Tables ……………………………………………………………… vi
1、 Introduction …………………………………………… 1
1.1 Background ……………………………………………… 1
1.2 Problem Definition …………………………………… 3
1.3 Structure of Thesis …………………………………… 4
2、 Related Works …………………………………………… 6
2.1 Language Model for Information Retrieval………… 6
2.2 Language Model with Category Smoothing ………… 9
2.3 Translation Model ……………………………………… 10
2.4 Translation-Based Language Model ………………… 12
2.5 Word2Vec ………………………………………………… 13
2.6 Apache Lucene …………………………………………… 19
2.7 BM25 Similarity ………………………………………… 19
3、 Approach ………………………………………………… 21
3.1 Word Embedding Learning ……………………………… 21
3.2 Ranking function for question title and
description ……………………………………………… 22
3.3 Reputation system in the forum …………………… 24
4、 Experiment & Evaluation ……………………………… 27
4.1 Data Sets ………………………………………………… 27
4.2 Test sets ………………………………………………… 28
4.3 Word2vec Training ……………………………………… 30
4.4 Baseline ………………………………………………… 31
4.4.1 Language Model for Information Retrieval ……… 31
4.4.2 Language Model with Category Smoothing ………… 32
4.4.3 Distributed Representation of Data ……………… 33
4.5 Evaluation Metrics …………………………………… 35
4.6 Main Result ……………………………………………… 35
5、 Discussion ……………………………………………… 37
6、 Conclusion ……………………………………………… 39
References ……………………………………………………………… 40

參考文獻

References
[1] Adam Berger, Rich Caruana, David Cohn, Dayne Freitag, and Vibhu Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 192–199. ACM, 2000.
[2] Li Cai, Guangyou Zhou, Kang Liu, and Jun Zhao. Learning the latent topics for question retrieval in community qa. In IJCNLP, volume 11, pages 273–281, 2011.
[3] Xin Cao, Gao Cong, Bin Cui, Christian Søndergaard Jensen, and Ce Zhang. The use of categorization information in language models for question retrieval. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 265–274. ACM, 2009.
[4] Long Chen, Dell Zhang, and Mark Levene. Question retrieval with user intent. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 973–976. ACM, 2013.
[5] Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. Question answering passage retrieval using dependency relations. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 400–407. ACM, 2005.
[6] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[7] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[8] Branko Milosavljevic, Danijela Boberic, and Duˇsan Surla. Retrieval of bibliographic records using apache lucene. The Electronic Library, 28(4):525–539, 2010.
[9] Joaqu´ın P´erez-Iglesias, Jos´e R P´erez-Ag¨uera, V´ıctor Fresno, and Yuval Z Feinstein. Integrating the probabilistic models bm25/bm25f into lucene. arXiv preprint arXiv:0911.5046, 2009.
[10] Jay M Ponte and W Bruce Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–281. ACM, 1998.
[11] Chirag Shah and Jefferey Pomerantz. Evaluating and predicting answer quality in community qa. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411–418. ACM, 2010.
[12] Fei Song and W Bruce Croft. A general language model for information retrieval. In Proceedings of the eighth international conference on Information and knowledge management, pages 316–321. ACM, 1999.
[13] Kai Wang and Tat-Seng Chua. Exploiting salient patterns for question detection and question retrieval in community-based question answering. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1155–1163. Association for Computational Linguistics, 2010.
[14] Xiaobing Xue, Jiwoon Jeon, andWBruce Croft. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 475–482. ACM, 2008.
[15] Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179–214, 2004.
[16] Dell Zhang and Wee Sun Lee. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 26–32. ACM, 2003.
[17] Kai Zhang,WeiWu, FangWang, Ming Zhou, and Zhoujun Li. Learning distributed representations of data in community question answering for question retrieval. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pages 533–542. ACM, 2016.
[18] Weinan Zhang, Zhaoyan Ming, Yu Zhang, Liqiang Nie, Ting Liu, and Tat-Seng Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In COLING, pages 3105–3120, 2012.
[19] Guangyou Zhou, Fang Liu, Yang Liu, Shizhu He, Jun Zhao, et al. Statistical machine translation improves question retrieval in community question answering via matrix factorization. In ACL (1), pages 852–861, 2013.
[20] Guangyou Zhou, Yang Liu, Fang Liu, Daojian Zeng, and Jun Zhao. Improving question retrieval in community question answering using world knowledge. In IJCAI, volume 13, pages 2239–2245, 2013.
[21] Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu. Learning continuous word embedding with metadata for question retrieval in community question answering. In Proceedings of ACL, pages 250–259, 2015.
[22] Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, and Xiaolong Wang. Answer sequence learning with neural networks for answer selection in community question answering. arXiv preprint arXiv:1506.06490, 2015.

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2016-7-20

推文