本研究的訓練方法屬於第二類,我們透過自注意力機制對現有的詞向量進行調整,使詞向量能學習到詞庫中的同義詞反義詞關係。實驗發現透過自注意力機制訓練出的新的詞向量更符合詞庫中的字詞關係的詞向量,但將此詞向量對下游的自然語言處理任務進行處理時,卻得到比調整前的詞向量更差的結果。;The concept of "vector" has been widely used in machine learning. For example, in the field of natural language processing, researchers convert words into vectors, also known as word embeddings, so that computers can access a fixed-length vector as features for model training. Researchers also study methodologies to generate word embeddings that better express the semantic relationship between the words specified in the lexicon. These methods can be divided into two categories. The first type is to generate word embeddings by simultaneously considering both the word co-appearance relationship in a given corpus and lexicon knowledge, e.g., synonyms or antonyms. The second type is to adjust existing (pre-trained) word embeddings with lexicon knowledge.
We study the second type of method in this thesis. We adjust the pre-trained word embeddings through a self-attention mechanism so that the word embeddings can preserve the relationship between synonyms and antonyms in the lexicon. Experimental results show that the adjusted word embeddings indeed better keep synonym and antonym information. However, if these word embeddings are used as the input of the downstream natural language processing task, it will get worse results than the word embeddings before adjustment.