基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：43

、訪客IP：18.219.116.93

姓名

陳沛伃(Pei-Yu Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家
(Finding experts in Community Question Answering websites using History Post Embedding and Topic Expertise Model features)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究	★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索	★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現	★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法
★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結	★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data
★ 藉由加入多重語音辨識結果來改善對話狀態追蹤	★ 對話系統應用於中文線上客服助理:以電信領域為例
★ 應用遞歸神經網路於適當的時機回答問題	★ 使用多任務學習改善使用者意圖分類
★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法	★ 使用YMCL模型改善使用者意圖分類成效

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著科技的日新月異，我們隨時都要精進自己以獲取新知，避免被世界淘汰，於是帶動諸如Stack Overflow, Yahoo Answers, Quora, Zhihu (知乎)等社群問答網站(Community Question Answering，CQA)的興起。使用者可以在上面提問、回答問題，作為彼此交流與學習的平台。

雖然社群問答網站的興起帶給使用者很大的便利，但是由於問題數量眾多，多數問題通常杳無音訊，想要及時得到問題正確的回覆，不可否認需要運氣與時間的等待。我們認為，若可於CQA 網站中正確地找出專家，則可藉由把對的問題推薦給有能力回答的專家，便可提升使用者互動，解決問題之效率。

本研究首先透過非監督方法 -- Yang, Liu, et al. (2013)所建的TEM (Topic Expertise Model) 模型，擷取使用者對每個主題下專精程度的特徵向量，並利用History post embedding，以詞嵌入(Word Embedding)的特性，擷取語意程度的特徵向量，再利用問題與回答者之相似度作為推薦專家之依據。我們鎖定Stack Overflow (世界前幾大的程式設計領域的問答網站)作為研究目標，並獲得良好之準確率，並期望研究成果可於其他CQA 網站使用。

本篇論文的貢獻是將TEM模型與詞嵌入的歷史資訊做結合,當在社群網路結構並非那麼完整時有效的把對的問題配對給對這個問題有能力回答的專家以提升社群網路參予度低的問題。

摘要(英)

With the ever-changing technology, we humans have to be willing to keep on learning in order to avoid being demoted by the world. Therefore, the reasons above led to the rise of the community question answering websites, such as Stack Overflow, Yahoo Answers, Quora, Zhihu (知乎), and so on and so forth. Users can ask questions, answer questions, exchange and discuss ideas with each other in the above platform.

Although the rise of community question answering websites can surely bring great convenience to users, there is still room for improvement. Due to the large numbers of questions, most questions usually receive no response or get inappropriate answers. It is without doubt to rely on luck and time to get correct answers in time. Therefore, we believe that if we can find experts precisely in CQA websites, we can improve the efficiency of the participation rate by routing right questions to experts.

In this study, we firstly utilize TEM (Topic Expertise Model), which is an unsupervised model published by Yang, Liu, et al. (2013), for capturing the degree of expertise of question and answerer under different topic. Furthermore, we utilize History Post Embedding, which is published in this thesis by using word embedding techniques, to extract semantic meanings to enhance the understanding of question sets. Finally, we combine the vector of topical expertise with History Post Embedding and perform a recommendation formula to rank experts. We target Stack Overflow, which is one of the biggest computer programming field CQA websites in the world, as our research goal and obtain good result. Moreover, we expect the research result to be available on other CQA websites.

The main contribution of this thesis is combining TEM model with distributed representation of user historical information which can solve the problem of low participation rate in CQA websites when social network structure is not so complete.

關鍵字(中)

★ 詞嵌入
★ 社群問答網站
★ TEM
★ 佩奇排名
★ 主題模型
★ 專家

關鍵字(英)

★ Word2Vec
★ CQA
★ TEM
★ PageRank
★ Topic Model
★ Experts

論文目次

Contents
摘要 i
Abstract ii
Acknowledgment iii
Contents iv
List of Figures v
List of Tabes vi
1. Introduction 1
1.1 Motivation 1
1.2 Problem description 5
1.3 Thesis organization 6
2. Related Work 7
2.1 Opinion leader finding 7
2.2 Traditional expert finding tasks 8
3. Methodology 13
3.1 Formal problem definition 13
3.2 System flow 13
3.2.1 Module 1 – Solr 14
3.2.2 Module 2 – Preprocessing 14
3.2.3 Module 3 – Topic Expertise Model 14
3.2.4 Module 4 – History Post Embedding 21
3.2.5 Module 5 – Recommendation formula 27
4. Experiment 28
4.1 Datasets 28
4.2 Experimental settings 30
4.3 Evaluation 32
4.3.1 Evaluation methodology 32
4.3.2 Ground Truth 34
4.4 Experimental results 37
5. Discussion 39
6. Conclusion 41
Reference 42

參考文獻

1. Riahi, F., et al. Finding expert users in community question answering. in Proceedings of the 21st International Conference on World Wide Web. 2012. ACM.
2. Guo, J., et al. Tapping on the potential of q&a community by recommending answer providers. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM.
3. "Stackoverflow.com Site Info". Alexa Internet.: p. Retrieved 2017-08-14.
4. Spolsky, J., "Stack Overflow Launches". Joel on Software. (2008-09-15).
5. Duan, J., J. Zeng, and B. Luo. Identification of opinion leaders based on user clustering and sentiment analysis. in Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01. 2014. IEEE Computer Society.
6. Weng, J., et al. Twitterrank: finding topic-sensitive influential twitterers. in Proceedings of the third ACM international conference on Web search and data mining. 2010. ACM.
7. Agarwal, N., et al. Identifying the influential bloggers in a community. in Proceedings of the 2008 international conference on web search and data mining. 2008. ACM.
8. Yu, X., X. Wei, and X. Lin, Algorithms of BBS Opinion Leader Mining Based on Sentiment Analysis. WISM, 2010. 10: p. 360-369.
9. Katz, E. and P.F. Lazarsfeld, Personal Influence, The part played by people in the flow of mass communications. 1966: Transaction Publishers.
10. Wang, W. and W.N. Street, Modeling influence diffusion to uncover influence centrality and community structure in social networks. Social Network Analysis and Mining, 2015. 5(1): p. 15.
11. Bonacich, P., Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 1972. 2(1): p. 113-120.
12. Katz, L., A new status index derived from sociometric analysis. Psychometrika, 1953. 18(1): p. 39-43.
13. Page, L., et al., The PageRank citation ranking: Bringing order to the web. 1999, Stanford InfoLab.
14. Zhu, H., et al., Ranking user authority with relevant knowledge categories for expert finding. World Wide Web, 2014. 17(5): p. 1081-1107.
15. Zhou, G., et al. Topic-sensitive probabilistic model for expert finding in question answer communities. in Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. ACM.
16. Liu, X., W.B. Croft, and M. Koll. Finding experts in community-based question-answering services. in Proceedings of the 14th ACM international conference on Information and knowledge management. 2005. ACM.
17. Miller, D.R., T. Leek, and R.M. Schwartz. A hidden Markov model information retrieval system. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.
18. Lavrenko, V. and W.B. Croft. Relevance based language models. in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001. ACM.
19. Xu, J. and W.B. Croft. Cluster-based language models for distributed retrieval. in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. ACM.
20. Ponte, J.M. and W.B. Croft. A language modeling approach to information retrieval. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998. ACM.
21. Qu, M., et al. Probabilistic question recommendation for question answering communities. in Proceedings of the 18th international conference on World wide web. 2009. ACM.
22. Yang, L., et al. Cqarank: jointly model topics and expertise in community question answering. in Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013. ACM.
23. Blei, D.M., A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation. Journal of machine Learning research, 2003. 3(Jan): p. 993-1022.
24. Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
25. Rong, X., word2vec parameter learning explained. arXiv preprint arXiv:1411.2738, 2014.
26. Adamic, L.A., et al. Knowledge sharing and yahoo answers: everyone knows something. in Proceedings of the 17th international conference on World Wide Web. 2008. ACM.

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2018-1-17

推文