利用輔助語句與BERT模型偵測詞彙的上下位關係

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：109

、訪客IP：18.190.160.6

姓名

曾莊(Chuang Tseng) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

利用輔助語句與BERT模型偵測詞彙的上下位關係
(Hypernym and Hyponym Detection Based on Auxiliary Sentences and the BERT Model)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 預測交通需求之分佈與數量—基於多重式注意力機制之AR-LSTMs 模型	★ 動態多模型融合分析研究
★ 擴展點擊流：分析點擊流中缺少的使用者行為	★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法
★ 融合多模型排序之點擊預測模型	★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為
★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量	★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機
★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架	★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究
★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測	★ 利用 SCPL 分解端到端倒傳遞演算法
★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路	★ 使用圖神經網路偵測 PTT 的低活躍異常帳號

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

詞向量模型是一種利用文本的上下文關係產生詞彙相應之向量的技術。通常，我們可利用詞向量間的餘弦相似度來計算兩個詞彙間的相關程度。然而，我們卻難以利用詞向量來偵測兩個詞彙是否具備上位詞-下位詞的關係。另外，由於上下關係是一種不對稱的語義關係，即使給定一對具備上位詞-下位詞關係詞彙，我們也難以採用一般對稱的距離量度來決定何者為上位詞、何者為下位詞。
本論文提出一個基於 BERT 預訓練語言模型搭配額外建構的輔助語句來判斷一對詞彙的上下關係，任務共分兩階段。階段一：判斷詞對是否具有上下關係。若階段一的結果為真，則進入階段二：判斷何者為上位詞，何者為下位詞。經過實驗，我們發現兩種建構輔助語句的方式：BERT+Q 和 BERT+Q+PosNeg 能有效地利用詞向量判斷階段一及階段二的任務。

摘要(英)

The word embedding model is a technique that utilizes contextual words to generate a vector for each word, which is called word embedding. Usually, we can use the cosine similarity between a pair of word embeddings
to calculate the relevance score between the two words. However, it is difficult to use word embeddings to detect the hypernym-hyponym relationship between two words. In addition, being an asymmetric semantic relationship, even when given a pair of vocabularies with a hypernym-hyponym relationship, it is challenging to apply general distance measures, which
are often symmetric, to determine which is the hypernym and which is the
hyponym.
This thesis proposes a model based on a BERT pre-trained model with auxiliary sentences to determine the hypernym-hyponym relationship of a pair of words. The entire process is consisted of two tasks. First, when given a pair of words, the model determines whether the word pair has a hypernym-hyponym relationship. Then, if the result is true, the model proceeds to the second task: distinguishing the hypernym and the hyponym. Experimental results show that two approaches to construct auxiliary sentences, BERT+Q and BERT+Q+PosNeg, can effectively accomplish both tasks.

關鍵字(中)

★ 詞向量
★ BERT 語言模型
★ 微調
★ 上下關係

關鍵字(英)

論文目次

摘要 iv
Abstract v
目錄 vii
圖目錄 x
表目錄 xi
一、緒論 1
1.1 研究動機 1
1.2 研究目標 2
1.3 研究貢獻 2
1.4 論文架構 2
二、相關研究 3
2.1 加強詞向量的同反義字詞辨別能力 3
2.1.1 Retrofitting 3
2.1.2 JointReps 3
2.2 加強詞向量的上下位字詞辨別能力 4
2.2.1 HyperVec 4
2.2.2 Poincaré 6
2.2.3 LEAR 6
2.2.4 HWE 8
2.2.5 Roller and Erk 9
2.2.6 Shwartz 10
2.2.7 BiRRE 10
2.3 Language Model(e.g., BERT) 加上輔助句子的研究 11
2.3.1 使用輔助句子幫助「面相情感分析」任務 11
2.3.2 使用輔助句子幫助「文字分類」任務 12
三、模型及方法 15
3.1 模型架構 15
3.2 Task1 模型 16
3.3 Task2 模型 18
3.4 損失函數 19
四、實驗結果 20
4.1 實驗參數細節 20
4.2 訓練用資料集 20
4.2.1 訓練用資料集 Shartz 介紹 20
4.2.2 自 WordNet 蒐集的上下關係資料集 21
4.2.3 用於 SVM 訓練資料集介紹 21
4.3 實驗一：task1 模型評量 22
4.3.1 實驗一評估用資料集 Shwartz、Kotlerman、BLESS、Baroni、Levy 介紹 22
4.3.2 Kotlerman、BLESS、Baroni、Levy、Shwartz 資料集實驗結果 23
4.3.3 Shwartz 資料集實驗結果 26
4.4 實驗二：task2 評量結果 30
4.4.1 評估用資料集 BLESShyper 介紹 30
4.4.2 評估用資料集 BIBLESS 介紹 30
4.4.3 BLESShyper 實驗結果 30
4.4.4 BIBLESS 實驗結果 31
4.5 實驗三：task1 + task2 評量結果 31
4.5.1 評估用資料集 BIBLESS 介紹 32
4.5.2 評估用資料集 Hyperlex 介紹 32
4.5.3 Bibless 資料集實驗結果 33
4.5.4 HyperLex 資料集實驗結果 34
4.6 實驗四：task1 Pos-neg 接 task2(Q, Pos-neg, AB) 36
4.6.1 BIBLESS 資料集實驗結果 36
4.6.2 HyperLex 實驗結果 37
4.7 實驗五：task1+task2 用於樹狀結構預測 37
五、總結 38
5.1 結論 38
5.2 未來展望 39
參考文獻 40

參考文獻

[1] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.
[2] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” arXiv preprint arXiv:1310.4546, 2013.
[3] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[4] M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofitting word vectors to semantic lexicons,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 1606–1615.
[5] I. Vulić and N. Mrkšić, “Specialising word vectors for lexical entailment,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 1134–1145.
[6] M. Alsuhaibani, D. Bollegala, T. Maehara, and K.-i. Kawarabayashi, “Jointly learning word embeddings using a corpus and a knowledge base,” PloS one, vol. 13, no. 3, e0193094, 2018.
[7] K. A. Nguyen, M. Köper, S. S. i. Walde, and N. T. Vu, “Hierarchical embeddings for hypernymy detection and directionality,” arXiv preprint arXiv:1707.07273, 2017.
[8] M. Nickel and D. Kiela, “Poincaré embeddings for learning hierarchical representations,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6341–6350.
[9] M. Alsuhaibani, T. Maehara, and D. Bollegala, “Joint learning of hierarchical word embeddings from a corpus and a taxonomy,” in Automated Knowledge Base Construction (AKBC), 2018.
[10] V. Shwartz, Y. Goldberg, and I. Dagan, “Improving hypernymy detection with an integrated path-based and distributional method,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 2389–2398.
[11] C. Wang and X. He, “BiRRE: Learning bidirectional residual relation embeddings for supervised hypernymy detection,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online: Association for Computational Linguistics, Jul. 2020, pp. 3630–3640. doi: 10.18653/v1/2020.aclmain.334. [Online]. Available: https://www.aclweb.org/anthology/2020.aclmain.334.
[12] C. Sun, L. Huang, and X. Qiu, “Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 380–385. doi: 10.18653/v1/N19 1035. [Online]. Available: https://www.aclweb.org/anthology/N19-1035.
[13] S. Yu, J. Su, and D. Luo, “Improving bert-based text classification with auxiliary sentence and domain knowledge,” IEEE Access, vol. 7, pp. 176 600–176 612, 2019. doi: 10.1109/ACCESS.2019.2953990.
[14] K. A. Nguyen, S. Schulte im Walde, and N. T. Vu, “Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume2: Short Papers), Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 454–459. doi: 10 . 18653 / v1 / P16 - 2074. [Online]. Available: https ://aclanthology.org/P16-2074.
[15] S. Roller and K. Erk, “Relations such as hypernymy: Identifying and exploiting hearst patterns in distributional vectors for lexical entailment,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 2163–
2172. doi: 10.18653/v1/D16-1234. [Online]. Available: https://www.aclweb.org/anthology/D16-1234.
[16] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[17] D. Bollegala, M. Alsuhaibani, T. Maehara, and K.-i. Kawarabayashi, “Joint word representation learning using a corpus and a semantic lexicon,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, 2016.
[18] G. Glavaš and S. P. Ponzetto, “Dual tensor model for detecting asymmetric lexicosemantic relations,” in Proceedings of the 2017 Conference on Empirical Methods
in Natural Language Processing, 2017, pp. 1757–1767.
[19] C. Fellbaum, “Wordnet,” in Theory and applications of ontology: computer applications, Springer, 2010, pp. 231–243.
[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[21] I. Vulić, D. Gerz, D. Kiela, F. Hill, and A. Korhonen, “Hyperlex: A large-scale evaluation of graded lexical entailment,” Computational Linguistics, vol. 43, no. 4, pp. 781–835, 2017.
[22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[23] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in The Semantic Web, K. Aberer, K.-S. Choi,
N. Noy, D. Allemang, K.-I. Lee, L. Nixon, J. Golbeck, P. Mika, D. Maynard, R.Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 722–735, isbn: 978-3-540-76298-0.
[24] D. Vrandečić, “Wikidata: A new platform for collaborative data collection,” in Proceedings of the 21st International Conference on World Wide Web, ser. WWW ’12 Companion, Lyon, France: Association for Computing Machinery, 2012, pp. 1063–1064, isbn: 9781450312301. doi: 10.1145/2187980.2188242. [Online]. Available: https://doi.org/10.1145/2187980.2188242.
[25] F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: A core of semantic knowledge,” in Proceedings of the 16th international conference on World Wide Web, 2007, pp. 697–706.
[26] L. KOTLERMAN, I. DAGAN, I. SZPEKTOR, and M. ZHITOMIRSKY-GEFFET, “Directional distributional similarity for lexical inference,” Natural Language Engineering, vol. 16, no. 4, pp. 359–389, 2010.
[27] M. Baroni and A. Lenci, “How we blessed distributional semantic evaluation,” in Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, 2011, pp. 1–10.
[28] M. Baroni, R. Bernardi, N.-Q. Do, and C.-c. Shan, “Entailment above the word level in distributional semantics,” in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 23–32.
[29] O. Levy, I. Dagan, and J. Goldberger, “Focused entailment graphs for open ie propositions,” in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 2014, pp. 87–97.
[30] M. Zhitomirsky-Geffet and I. Dagan, “Bootstrapping distributional feature vector quality,” Computational linguistics, vol. 35, no. 3, pp. 435–461, 2009.
[31] A. Lenci and G. Benotto, “Identifying hypernyms in distributional semantic spaces,” in * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), 2012, pp. 75–79.
[32] D. Kiela, L. Rimell, I. Vulic, and S. Clark, “Exploiting image generality for lexical entailment detection,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), ACL; East Stroudsburg, PA, 2015, pp. 119–124.
[33] S. Roller, D. Kiela, and M. Nickel, “Hearst patterns revisited: Automatic hypernym detection from large text corpora,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 358– 363. doi: 10.18653/v1/P18-2057. [Online]. Available: https://www.aclweb.org/anthology/P182057.
[34] M. Le, S. Roller, L. Papaxanthos, D. Kiela, and M. Nickel, “Inferring concept hierarchies from text corpora via hyperbolic embeddings,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 3231–3241. doi: 10.18653/v1/P19-1313. [Online]. Available: https://www.aclweb.org/anthology/P19-1313.
[35] I. Vulić, D. Gerz, D. Kiela, F. Hill, and A. Korhonen, “HyperLex: A large-scale evaluation of graded lexical entailment,” Computational Linguistics, vol. 43, no. 4,pp. 781–835, Dec. 2017. doi: 10.1162/COLI_a_00301. [Online]. Available: https://www.aclweb.org/anthology/J17-4004.

指導教授

陳弘軒

審核日期

2021-8-10

推文