以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：41

、訪客IP：18.188.228.135

姓名

陳堃澤(Kun-Ze Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究
(Adjusting Word Embeddings Based on the Dictionary of Synonyms and Antonyms and Its Influence on Downstream NLP Tasks -- an Empirical Study)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機	★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架
★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測	★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路
★ 使用圖神經網路偵測 PTT 的低活躍異常帳號	★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線
★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型	★ 一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

「向量」的概念目前已被廣泛運用在機器學習的領域裡。在自然語言處理領域中，學者將輸入的字詞轉換為「詞向量」，以讓電腦進行方便且有效的模型訓練。而現今學者也致力於研究如何讓詞向量更能表達出符合文本與詞庫中的字詞關係，大致上能將這些訓練方法分為兩類，一是使用文本加上詞庫中的字詞關係同時訓練，另一種則是對現今已存在的詞向量加上字詞關係同時訓練再進行訓練。

本研究的訓練方法屬於第二類，我們透過自注意力機制對現有的詞向量進行調整，使詞向量能學習到詞庫中的同義詞反義詞關係。實驗發現透過自注意力機制訓練出的新的詞向量更符合詞庫中的字詞關係的詞向量，但將此詞向量對下游的自然語言處理任務進行處理時，卻得到比調整前的詞向量更差的結果。

摘要(英)

The concept of "vector" has been widely used in machine learning. For example, in the field of natural language processing, researchers convert words into vectors, also known as word embeddings, so that computers can access a fixed-length vector as features for model training. Researchers also study methodologies to generate word embeddings that better express the semantic relationship between the words specified in the lexicon. These methods can be divided into two categories. The first type is to generate word embeddings by simultaneously considering both the word co-appearance relationship in a given corpus and lexicon knowledge, e.g., synonyms or antonyms. The second type is to adjust existing (pre-trained) word embeddings with lexicon knowledge.

We study the second type of method in this thesis. We adjust the pre-trained word embeddings through a self-attention mechanism so that the word embeddings can preserve the relationship between synonyms and antonyms in the lexicon. Experimental results show that the adjusted word embeddings indeed better keep synonym and antonym information. However, if these word embeddings are used as the input of the downstream natural language processing task, it will get worse results than the word embeddings before adjustment.

關鍵字(中)

★ 調整詞向量
★ 同義詞
★ 反義詞
★ 自然語言處理任務

關鍵字(英)

★ Adjusting Word Embedding
★ Synonyms
★ Antonyms
★ Natural Language Processing Task

論文目次

摘要(ix)
Abstract(xi)
目錄(xiii)
一、緒論(1)
二、相關研究(5)
2.1使用文本及詞庫中的字詞關係同時訓練詞向量(5)
2.2使用詞庫中的字詞關係調整預訓練詞向量(6)
三、研究模型及方法(11)
3.1Task 1：調整詞向量(11)
3.1.1資料前處理(11)
3.1.2自注意力機制（Self-Attention Mechanism）(12)
3.1.3損失函數(13)
3.2Task 2：自然語言處理任務(14)
3.2.1長短期記憶模型（LSTM）(15)
四、實驗結果與分析(17)
4.1Task 1：調整詞向量(17)
4.1.1模型設置(17)
4.1.2實驗資料集(18)
4.1.3實驗結果(20)
4.2Task 2：自然語言處理任務(35)
4.2.1模型設置(35)
4.2.2IMDb(35)
4.2.3US Airline(42)
五、總結(47)
5.1結論(47)
5.2未來展望(48)
參考文獻(49)
附錄A:實驗程式碼(53)

參考文獻

[1]Z. S. Harris, “Distributional structure,”Word, vol. 10, no. 2-3, pp. 146–162, 1954.
[2]J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for wordrepresentation,” inEmpirical Methods in Natural Language Processing (EMNLP),2014, pp. 1532–1543. [Online]. Available:http://www.aclweb.org/anthology/D14-1162.
[3]T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed represen-tations of words and phrases and their compositionality,”arXiv preprint arXiv:1310.4546,2013.
[4]林冠佑, “基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量,” Thesis, 2020. [Online]. Available:https://hdl.handle.net/11296/qe48u8.
[5]W.-t. Yih, G. Zweig, and J. C. Platt, “Polarity inducing latent semantic analysis,” inProceedings of the 2012 Joint Conference on Empirical Methods in Natural LanguageProcessing and Computational Natural Language Learning, 2012, pp. 1212–1222.
[6]M. Yu and M. Dredze, “Improving lexical embeddings with semantic knowledge,” inProceedings of the 52nd Annual Meeting of the Association for Computational Lin-guistics (Volume 2: Short Papers), Baltimore, Maryland: Association for Compu-tational Linguistics, Jun. 2014, pp. 545–550.doi:10.3115/v1/P14-2089. [Online].Available:https://www.aclweb.org/anthology/P14-2089.
[7]C. Xu, Y. Bai, J. Bian, B. Gao, G. Wang, X. Liu, and T.-Y. Liu, “Rc-net: A generalframework for incorporating knowledge into word representations,” inProceedingsof the 23rd ACM international conference on conference on information and knowl-edge management, 2014, pp. 1219–1228.
[8]J. Bian, B. Gao, and T.-Y. Liu, “Knowledge-powered deep learning for word embed-ding,” inJoint European conference on machine learning and knowledge discoveryin databases, Springer, 2014, pp. 132–148.
[9]D. Fried and K. Duh, “Incorporating both distributional and relational semanticsin word representations,”arXiv preprint arXiv:1412.4369, 2014.
[10]E. Pavlick, P. Rastogi, J. Ganitkevitch, B. Van Durme, and C. Callison-Burch,“Ppdb 2.0: Better paraphrase ranking, fine-grained entailment relations, word em-beddings, and style classification,” inProceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th International Joint Con-ference on Natural Language Processing (Volume 2: Short Papers), 2015, pp. 425–430.
[11]R. Schwartz, R. Reichart, and A. Rappoport, “Symmetric pattern based word em-beddings for improved word similarity prediction,” inProceedings of the nineteenthconference on computational natural language learning, 2015, pp. 258–267.
[12]M. Ono, M. Miwa, and Y. Sasaki, “Word embedding-based antonym detection usingthesauri and distributional information,” inProceedings of the 2015 Conference ofthe North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015, pp. 984–989.
[13]D. Osborne, S. Narayan, and S. B. Cohen, “Encoding prior knowledge with eigen-word embeddings,”Transactions of the Association for Computational Linguistics,vol. 4, pp. 417–430, 2016.
[14]M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofittingword vectors to semantic lexicons,” inProceedings of the 2015 Conference of theNorth American Chapter of the Association for Computational Linguistics: HumanLanguage Technologies, Denver, Colorado: Association for Computational Linguis-tics, May 2015, pp. 1606–1615.doi:10.3115/v1/N15-1184. [Online]. Available:https://www.aclweb.org/anthology/N15-1184.
[15]N. Mrkšić, D. Ó Séaghdha, B. Thomson, M. Gašić, L. M. Rojas-Barahona, P.-H.Su, D. Vandyke, T.-H. Wen, and S. Young, “Counter-fitting word vectors to lin-guistic constraints,” inProceedings of the 2016 Conference of the North AmericanChapter of the Association for Computational Linguistics: Human Language Tech-nologies, San Diego, California: Association for Computational Linguistics, Jun.2016, pp. 142–148.doi:10.18653/v1/N16- 1018. [Online]. Available:https://www.aclweb.org/anthology/N16-1018.
[16]J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “From paraphrase databaseto compositional paraphrase model and back,”Transactions of the Association forComputational Linguistics, vol. 3, pp. 345–358, 2015.doi:10.1162/tacl_a_00143.[Online]. Available:https://www.aclweb.org/anthology/Q15-1025.
[17]N. Mrkšić, I. Vulić, D. Ó. Séaghdha, I. Leviant, R. Reichart, M. Gašić, A. Korho-nen, and S. Young, “Semantic specialization of distributional word vector spacesusing monolingual and cross-lingual constraints,”Transactions of the associationfor Computational Linguistics, vol. 5, pp. 309–324, 2017.
[18]J.-K. Kim, M.-C. de Marneffe, and E. Fosler-Lussier, “Adjusting word embeddingswith semantic intensity orders,” inProceedings of the 1st Workshop on Represen-tation Learning for NLP, Berlin, Germany: Association for Computational Lin-guistics, Aug. 2016, pp. 62–69.doi:10.18653/v1/W16-1607. [Online]. Available:https://www.aclweb.org/anthology/W16-1607.
[19]H. Jo and S. J. Choi, “Extrofitting: Enriching word representation and its vectorspace with semantic lexicons,” inProceedings of The Third Workshop on Repre-sentation Learning for NLP, Melbourne, Australia: Association for ComputationalLinguistics, Jul. 2018, pp. 24–29.doi:10.18653/v1/W18-3003. [Online]. Available:https://www.aclweb.org/anthology/W18-3003.
[20]A. V. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural net-works,” inProceedings of The 33rd International Conference on Machine Learning,M. F. Balcan and K. Q. Weinberger, Eds., ser. Proceedings of Machine LearningResearch, vol. 48, New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 1747–1756. [Online]. Available:http://proceedings.mlr.press/v48/oord16.html.
[21]S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation,vol. 9, no. 8, pp. 1735–1780, 1997.
[22]J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gatedrecurrent neural networks on sequence modeling,”arXiv preprint arXiv:1412.3555,2014.
[23]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,and I. Polosukhin, “Attention is all you need,” inAdvances in Neural InformationProcessing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017, pp. 5998–6008. [Online]. Available:http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
[24]Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and Q. V. Le,“Xlnet: Generalized autoregressive pretraining for language understanding,”CoRR,vol. abs/1906.08237, 2019. arXiv:1906.08237. [Online]. Available:http://arxiv.org/abs/1906.08237.
[25]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deepbidirectional transformers for language understanding,”CoRR, vol. abs/1810.04805,2018. arXiv:1810.04805. [Online]. Available:http://arxiv.org/abs/1810.04805.
[26]A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learningword vectors for sentiment analysis,” inProceedings of the 49th Annual Meetingof the Association for Computational Linguistics: Human Language Technologies,Portland, Oregon, USA: Association for Computational Linguistics, Jun. 2011,pp. 142–150. [Online]. Available:http://www.aclweb.org/anthology/P11-1015.
[27]F. Morin and Y. Bengio, “Hierarchical probabilistic neural network language model.,”inAistats, Citeseer, vol. 5, 2005, pp. 246–252.
[28]I. Yamada, A. Asai, J. Sakuma, H. Shindo, H. Takeda, Y. Takefuji, and Y. Mat-sumoto, “Wikipedia2vec: An eﬀicient toolkit for learning and visualizing the em-beddings of words and entities from wikipedia,”arXiv preprint 1812.06280v3, 2020.
[29]P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors withsubword information,”arXiv preprint arXiv:1607.04606, 2016.
[30]E. Pavlick, P. Rastogi, J. Ganitkevitch, B. Van Durme, and C. Callison-Burch,“PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word em-beddings, and style classification,” inProceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th International Joint Con-ference on Natural Language Processing (Volume 2: Short Papers), Beijing, China:Association for Computational Linguistics, Jul. 2015, pp. 425–430.doi:10.3115/v1/P15-2070. [Online]. Available:https://www.aclweb.org/anthology/P15-2070.
[31]S. Rajana, C. Callison-Burch, M. Apidianaki, and V. Shwartz, “Learning antonymswith paraphrases and a morphology-aware neural network,” inProceedings of the6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), Van-couver, Canada: Association for Computational Linguistics, Aug. 2017, pp. 12–21.doi:10.18653/v1/S17-1002. [Online]. Available:https://www.aclweb.org/anthology/S17-1002.
[32]G. A. Miller, “Wordnet: A lexical database for english,”Commun. ACM, vol. 38,no. 11, pp. 39–41, Nov. 1995,issn: 0001-0782.doi:10.1145/219717.219748.[Online]. Available:https://doi.org/10.1145/219717.219748.
[33]C. F. Baker, C. J. Fillmore, and J. B. Lowe, “The Berkeley FrameNet project,”in36th Annual Meeting of the Association for Computational Linguistics and 17thInternational Conference on Computational Linguistics, Volume 1, Montreal, Que-bec, Canada: Association for Computational Linguistics, Aug. 1998, pp. 86–90.doi:10.3115/980845.980860. [Online]. Available:https://www.aclweb.org/anthology/P98-1013.
[34]F. Hill, R. Reichart, and A. Korhonen, “SimLex-999: Evaluating semantic modelswith (genuine) similarity estimation,”Computational Linguistics, vol. 41, no. 4,pp. 665–695, Dec. 2015.doi:10.1162/COLI_a_00237. [Online]. Available:https://www.aclweb.org/anthology/J15-4004.
[35]D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXivpreprint arXiv:1412.6980, 2014.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2021-8-9

推文