博碩士論文 105522039 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:12 、訪客IP:18.119.107.161
姓名 潘照霖(Chao-Lin Pan)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 深度神經網路架構之跨語言線上百科連結
(Cross-lingual Encyclopedia Linking in DNN Framework)
相關論文
★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識 於電子病歷之研究★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色 之關係:以互動視覺化呈現★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述:由主成分分析發想之K近鄰算法
★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data
★ 藉由加入多重語音辨識結果來改善對話狀態追蹤★ 對話系統應用於中文線上客服助理:以電信領域為例
★ 應用遞歸神經網路於適當的時機回答問題★ 使用多任務學習改善使用者意圖分類
★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 維基百科的出現,徹底改變人們學習新知識的習慣。其發展了超過298種不同語言的版本,在不同語言之間,卻存在文章數量不平衡的情形,英文維基的文章數量遙遙領先其他語言版本。以中文維基為例,其文章數量僅英文維基的六分之一。除此之外,維基百科在不同語言版本之間的跨語言間連結亦嚴重缺乏,根據統計,英文維基百科僅有2.3%的文章具有跨語言連結至其中文版本。
除維基百科外,在各國仍有提供特定語言的線上百科存在,其內容更豐富於維基百科的其他語言版本。因此我們以《英文維基》和《百度百科》為目標,建置「跨語言線上百科跨連結」,除有助於全球知識共享,更有助於跨語言相關研究。
以往跨語言百科連結的作法,常需要依賴語言特性。因此我們提出一深度學習模型,不需要依靠語言特性與線上百科的架構產生特徵,僅以文章內文作為訓練資料的依據,運用各種神經網路,辨別跨語文章的語意上的相似程度。在面對不同語言版本的資料時,僅需替換預訓練詞向量即可。
摘要(英) The emergence of Wikipedia has completely changed the habit of learning new knowledge. It has developed more than 298 different language versions. However, it exists imbalances in the number of articles between different language versions of Wikipedia. The number of articles in English Wikipedia is far ahead over other languages. Take Chinese Wikipedia as an example, the number of articles is only one-sixth of the English Wikipedia. In addition, Wikipedia′s cross-language links between different language versions are also seriously lacking. According to the statistics, only 2.3% of English Wikipedia articles have cross-language links to its Chinese versions.
Despite of Wikipedia, some other countries has its own online encyclopedias and its content is much more abundant than its language versions of Wikipedia. Therefore, we aim to build a "cross-language online encyclopedia between "English Wikipedia" and "Baidu Baike". It is not only contributing to global knowledge sharing, but more conducive to cross-language related research.
In previous CLAL works, their methods usually depend on the language characteristics and the structure of encyclopedia. Therefore, we propose a deep learning model, which only uses the textual main content as the basis of the training data, and various neural networks to distinguish the semantic similarity of the cross-language articles. When facing data in different language versions, the only thing to do is replacing the pre-training word embedding.
關鍵字(中) ★ 線上百科
★ 維基百科
★ 百度百科
★ 跨語言
★ 連結
★ 文章表示向量
★ 詞向量
★ 深度學習
★ 卷積類神經網路
★ 長短期記憶模型
★ 注意力機制
關鍵字(英)
論文目次 摘要 I
ABSTRACT II
ACKNOWLEDGMENTS IV
TABLE OF CONTENTS V
LIST OF FIGURES VII
LIST OF TABLES VII
1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 PROBLEM DESCRIPTION 2
1.3 RESEARCH OBJECTIVE 3
1.4 THESIS ORGANIZATION 3
2 RELATED WORK 5
2.1 CROSS-LANGUAGE ARTICLE LINKING 5
2.1.1 CLAL across Wikipedia Language Version 6
2.1.2 CLAL between Wikipedia and Baidu Baike 7
2.1.3 Others Knowledge Base Linking Task 7
2.2 Word embedding 8
2.3 Document Representation 9
2.3.1 STATISTICAL REPRESENTATION 9
2.3.2 NEURAL NETWORK METHOD 10
3 METHODOLOGY 11
3.1 PROBLEM DEFINITION 11
3.2 ARTICLE IN ONLINE ENCYCLOPEDIAS 12
3.3 SYSTEM FRAMEWORK 16
3.4 CANDIDATE SELECTION 17
3.5 LINK PREDICTION 18
3.5.1 CNN Paragraph Encoder 20
3.5.2 Bi-LSTM Paragraph Encoder 21
3.5.3 Attention-Based LSTM Paragraph Encoder 23
3.5.4 architecture of matching pairs 24
3.5.5 Join layer 24
3.5.6 Softmax 25
4 EXPERIMENT 26
4.1 DATASET DESCRIPTION 26
4.1.1 English Wikipedia Data 26
4.1.2 Baidu Baike Data 26
4.1.3 golden standard data 27
4.1.4 pre-trained word embedding 28
4.2 DATA PREPROCESSING 29
4.3 EXPERIMENT SETUP 30
4.3.1 BASELINE SETUP 30
4.3.2 DNN MODEL SETUP 31
4.4 EXPERIMENT 31
4.4.1 OBSERVATION OF BASELINE 31
4.4.2EXPERIMENT RESULT 32
4.4.3 DISCUSSION 34
5 CONCLUSION 37
5.1 CONCLUSION 37
5.2 FUTURE WORK 37
BIBLIOGRAPHY 38
參考文獻 [1] Wang, Y.-C., Wu, C.-K., & Tsai, R. T.-H. (2014). Cross-language and Cross-encyclopedia Article Linking Using Mixed-language Topic Model and Hypernym Translation. Paper presented at the Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
[2] Severyn, A., & Moschitti, A. (2015). Learning to rank short text pairs with convolutional deep neural networks. Paper presented at the Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval.
[3] Wang, Z., Li, J., Wang, Z., & Tang, J. (2012). Cross-lingual knowledge linking across wiki knowledge bases. Paper presented at the Proceedings of the 21st international conference on World Wide Web.
[4] Sorg, P., & Cimiano, P. (2008). Enriching the crosslingual link structure of wikipedia-a classification-based approach. Paper presented at the Proceedings of the AAAI 2008 workshop on wikipedia and artifical intelligence.
[5] Oh, J.-H., Kawahara, D., Uchimoto, K., Kazama, J. i., & Torisawa, K. (2008). Enriching multilingual language resources by discovering missing cross-language links in wikipedia. Paper presented at the Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT′08. IEEE/WIC/ACM International Conference on.
[6] Tsai, C.-T., & Roth, D. (2016). Cross-lingual wikification using multilingual embeddings. Paper presented at the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[7] Sil, A., Kundu, G., Florian, R., & Hamza, W. (2017). Neural cross-lingual entity linking. arXiv preprint arXiv:1712.01813.
[8] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[9] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
[10] Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
[11] Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Paper presented at the European conference on machine learning.
[12] Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. Paper presented at the Advances in neural information processing systems.
[13] Chen, Y.-w., Zhou, Q., Luo, W., & Du, J.-X. (2016). Classification of chinese texts based on recognition of semantic topics. Cognitive Computation, 8(1), 114-124.
[14] Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
[15] Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. Paper presented at the Proceedings of the 2015 conference on empirical methods in natural language processing.
[16] Schönhofen, P., Benczúr, A., Biro, I., & Csalogány, K. (2007). Cross-language retrieval with wikipedia. Paper presented at the Workshop of the Cross-Language Evaluation Forum for European Languages.
[17] Beaulieu, M., Gatford, M., Huang, X., Robertson, S., Walker, S., & Williams, P. (1997). Okapi at TREC-5. NIST SPECIAL PUBLICATION SP, 143-166.
[18] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[19] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
指導教授 蔡宗翰 審核日期 2018-10-1
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明