基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：132

、訪客IP：18.116.51.133

姓名

林冠佑(Kuan-Yu Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
(Adjusting Word Embeddings with Synonyms and Antonyms based on Undirected List Encoder Generated from Self-Attention)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機
★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架	★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究
★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測	★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路
★ 使用圖神經網路偵測 PTT 的低活躍異常帳號	★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線
★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型	★ 一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

自從詞向量被廣泛應用在許多自然語言處理任務，且取得不錯的成果後，學者們開始相信詞向量是可以有效學習到詞義資訊的，並開始研究如何提升詞向量的品質。本論文認為詞向量主要是透過上下文資訊進行學習，沒有利用到人類編撰的字詞關係，如：同/反義字，知識圖譜等，且我們推測詞向量在辨別同義詞與反義詞的能力上仍有進步空間，加入從字典中萃取出的知識應能改善，然而，過去相關的研究僅使用pairwise的方法對同義詞與反義詞進行調整，這種方法沒有辦法同時考慮一詞與其所有同義詞和反義詞之間的關係，因此本論文提出了listwise的方法來對詞向量進行調整，提升詞向量的品質。

經過實驗，本論文發現採用全局資訊的模型均優於只採用局部資訊的模型，其中學習分配不同注意力在同義詞和反義詞中不同的詞上，再結合這些資訊調整詞向量的自注意力機制更能有效的利用全局資訊，因此本論文選擇使用自注意力機制做為編碼器，在訓練後使用從字典中萃取出的同義詞與反義詞資訊調整詞向量，提升詞向量的品質。為了更多的提升詞向量的品質，本論文嘗試了正規化、殘差連結、多頭式自注意力機制、更深層的神經網路等方法，並設計實驗說明它們對模型的影響。

最後，本論文設計實驗證明經本方法調整後，使用少量文本預訓練的詞向量在同義詞任務中表現可以超越未調整但使用大量文本預訓練的詞向量，並從結果中發現同義詞相較於反義詞在相似度任務上是更有用的資訊，且同義詞和反義詞資訊並不是越多越好，品質也會影響調整後的結果。

摘要(英)

Since word embedding has become a standard technique working excellently in various natural language processing (NLP) tasks, there has been much research on improving the quality of the word embeddings. We argue that the word embeddings are mainly learned through contextual information but ignore the relationship (e.g., synonyms, antonyms, and knowledge graph) of words compiled by humans. We speculate that including human compiled information may improve the quality of the word embeddings. Unlike previous works, we purpose a listwise method that can consider the relations between a word and its synonyms and antonyms.

Experimental results show that our approach to adjust the word embeddings trained from small corpus yields comparable, sometimes even better, results than the word embeddings trained with a large corpus. Additionally, we show that both the quantity and quality of synonyms and antonyms affect the performance of our work. Finally, we show that models utilizing global information outperform the ones utilizing local information in most cases.

關鍵字(中)

★ 詞向量
★ 自注意力機制
★ 同義詞
★ 反義詞

關鍵字(英)

★ Word embeddings
★ Self-attention
★ Synonym
★ Antonym
★ Post-training
★ Listwise

論文目次

摘要 ix
Abstract xi
目錄 xiii
圖目錄 xv
表目錄 xvii
一、緒論 1
1.1 研究動機 1
1.2 研究目標 2
1.3 研究貢獻 3
1.4 論文架構 3
二、相關研究 5
2.1 Retrofitting Word Vectors to Semantic Lexicons 5
2.2 Counter-fitting Word Vectors to Linguistic Constraints 5
2.3 Adjusting Word Embeddings with Semantic Intensity Or-
ders 7
2.4 Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons 8
2.5 相關研究性質 9
xiii

目錄
三、模型及方法 11
3.1 模型架構 11
3.2 無方向性的序列編碼器 12
3.2.1 自注意力機制 (Self-Attention Mechanism) 12
3.2.2 殘差連結 (Residual Connections) 13
3.3 損失函數 14
四、實驗資料集 15
4.1 預訓練詞向量 15
4.2 同反義詞字典 15
4.3 相似度任務 16
五、實驗結果 17
5.1 使用從不同字典中萃取出的知識調整詞向量 17
5.2 使用不同模型做為編碼器 19
5.3 從不同字典中萃取出的知識調整不同的預訓練詞向量 21
5.4 輸入序列長度 22
5.5 詞向量訓練時間 23
5.6 模型收斂狀況 24
5.7 與相關研究比較 25
5.8 調整結果展示 26
六、總結 33
6.1 結論 33
6.2 未來展望 34
參考文獻 35

參考文獻

[1] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.
[2] M. Baroni, G. Dinu, and G. Kruszewski, “Don’t count, predict! a systematic com- parison of context-counting vs. context-predicting semantic vectors,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers), Baltimore, Maryland: Association for Computational Lin- guistics, Jun. 2014, pp. 238–247. DOI: 10.3115/v1/P14-1023. [Online]. Available: https://www.aclweb.org/anthology/P14-1023.
[3] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word repre- sentation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Lin- guistics, Oct. 2014, pp. 1532–1543. DOI: 10.3115/v1/D14-1162. [Online]. Available: https://www.aclweb.org/anthology/D14-1162.
[4] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed repre- sentations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2013, pp. 3111– 3119. [Online]. Available: http://papers.nips.cc/paper/5021- distributed- representations-of-words-and-phrases-and-their-compositionality.pdf.
[5] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” arXiv preprint arXiv:1607.04606, 2016.
[6] M. Yu and M. Dredze, “Improving lexical embeddings with semantic knowledge,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Lin- guistics (Volume 2: Short Papers), Baltimore, Maryland: Association for Compu- tational Linguistics, Jun. 2014, pp. 545–550. DOI: 10.3115/v1/P14-2089. [Online]. Available: https://www.aclweb.org/anthology/P14-2089.
[7] C. Xu, Y. Bai, J. Bian, B. Gao, G. Wang, X. Liu, and T.-Y. Liu, “Rc-net: A general framework for incorporating knowledge into word representations,” in Proceedings of the 23rd ACM international conference on conference on information and knowl- edge management, 2014, pp. 1219–1228.

[8] J. Bian, B. Gao, and T.-Y. Liu, “Knowledge-powered deep learning for word embed- ding,” in Joint European conference on machine learning and knowledge discovery in databases, Springer, 2014, pp. 132–148.
[9] D. Fried and K. Duh, “Incorporating both distributional and relational semantics in word representations,” arXiv preprint arXiv:1412.4369, 2014.
[10] M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofitting word vectors to semantic lexicons,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado: Association for Computational Linguis- tics, May 2015, pp. 1606–1615. DOI: 10.3115/v1/N15- 1184. [Online]. Available: https://www.aclweb.org/anthology/N15-1184.
[11] N. Mrkšić, D. Ó Séaghdha, B. Thomson, M. Gašić, L. M. Rojas-Barahona, P.-H. Su, D. Vandyke, T.-H. Wen, and S. Young, “Counter-fitting word vectors to lin- guistic constraints,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, San Diego, California: Association for Computational Linguistics, Jun. 2016, pp. 142–148. DOI: 10 . 18653 / v1 / N16 - 1018. [Online]. Available: https :
//www.aclweb.org/anthology/N16-1018.
[12] J.-K. Kim, M.-C. de Marneffe, and E. Fosler-Lussier, “Adjusting word embeddings with semantic intensity orders,” in Proceedings of the 1st Workshop on Represen- tation Learning for NLP, Berlin, Germany: Association for Computational Lin- guistics, Aug. 2016, pp. 62–69. DOI: 10.18653/v1/W16-1607. [Online]. Available: https://www.aclweb.org/anthology/W16-1607.
[13] H. Jo and S. J. Choi, “Extrofitting: Enriching word representation and its vector space with semantic lexicons,” in Proceedings of The Third Workshop on Repre- sentation Learning for NLP, Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 24–29. DOI: 10.18653/v1/W18-3003. [Online]. Available: https://www.aclweb.org/anthology/W18-3003.
[14] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104– 3112.
[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[16] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.

[17] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017, pp. 5998– 6008. [Online]. Available: http://papers.nips.cc/paper/7181-attention-is- all-you-need.pdf.
[18] I. Yamada, A. Asai, J. Sakuma, H. Shindo, H. Takeda, Y. Takefuji, and Y. Mat- sumoto, “Wikipedia2vec: An efficient toolkit for learning and visualizing the em- beddings of words and entities from wikipedia,” arXiv preprint 1812.06280v3, 2020.
[19] E. Pavlick, P. Rastogi, J. Ganitkevitch, B. Van Durme, and C. Callison-Burch, “PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word em- beddings, and style classification,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Con- ference on Natural Language Processing (Volume 2: Short Papers), Beijing, China: Association for Computational Linguistics, Jul. 2015, pp. 425–430. DOI: 10.3115/ v1/P15- 2070. [Online]. Available: https://www.aclweb.org/anthology/P15- 2070.
[20] S. Rajana, C. Callison-Burch, M. Apidianaki, and V. Shwartz, “Learning antonyms with paraphrases and a morphology-aware neural network,” in Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), Van- couver, Canada: Association for Computational Linguistics, Aug. 2017, pp. 12–21. DOI: 10. 18653/ v1/ S17- 1002. [Online]. Available: https:// www. aclweb. org/ anthology/S17-1002.
[21] G. A. Miller, “Wordnet: A lexical database for english,” Commun. ACM , vol. 38, no. 11, pp. 39–41, Nov. 1995, ISSN: 0001-0782. DOI: 10 . 1145 / 219717 . 219748. [Online]. Available: https://doi.org/10.1145/219717.219748.
[22] C. F. Baker, C. J. Fillmore, and J. B. Lowe, “The Berkeley FrameNet project,” in 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1, Montreal, Que- bec, Canada: Association for Computational Linguistics, Aug. 1998, pp. 86–90. DOI: 10.3115/980845.980860. [Online]. Available: https://www.aclweb.org/ anthology/P98-1013.
[23] E. Bruni, N. K. Tran, and M. Baroni, “Multimodal distributional semantics,” J. Artif. Int. Res., vol. 49, no. 1, pp. 1–47, Jan. 2014, ISSN: 1076-9757.
[24] F. Hill, R. Reichart, and A. Korhonen, “SimLex-999: Evaluating semantic models with (genuine) similarity estimation,” Computational Linguistics, vol. 41, no. 4, pp. 665–695, Dec. 2015. DOI: 10.1162/COLI_a_00237. [Online]. Available: https:
//www.aclweb.org/anthology/J15-4004.

[25] “Placing search in context: The concept revisited,” ACM Trans. Inf. Syst., vol. 20, no. 1, pp. 116–131, Jan. 2002, ISSN: 1046-8188. DOI: 10 . 1145 / 503104 . 503110. [Online]. Available: https://doi.org/10.1145/503104.503110.
[26] H. Rubenstein and J. B. Goodenough, “Contextual correlates of synonymy,” Com- mun. ACM , vol. 8, no. 10, pp. 627–633, Oct. 1965, ISSN: 0001-0782. DOI: 10.1145/ 365628.365657. [Online]. Available: https://doi.org/10.1145/365628.365

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2020-7-20

推文