參考文獻 |
[1]Z. S. Harris, “Distributional structure,”Word, vol. 10, no. 2-3, pp. 146–162, 1954.
[2]J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for wordrepresentation,” inEmpirical Methods in Natural Language Processing (EMNLP),2014, pp. 1532–1543. [Online]. Available:http://www.aclweb.org/anthology/D14-1162.
[3]T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed represen-tations of words and phrases and their compositionality,”arXiv preprint arXiv:1310.4546,2013.
[4]林冠佑, “基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量,” Thesis, 2020. [Online]. Available:https://hdl.handle.net/11296/qe48u8.
[5]W.-t. Yih, G. Zweig, and J. C. Platt, “Polarity inducing latent semantic analysis,” inProceedings of the 2012 Joint Conference on Empirical Methods in Natural LanguageProcessing and Computational Natural Language Learning, 2012, pp. 1212–1222.
[6]M. Yu and M. Dredze, “Improving lexical embeddings with semantic knowledge,” inProceedings of the 52nd Annual Meeting of the Association for Computational Lin-guistics (Volume 2: Short Papers), Baltimore, Maryland: Association for Compu-tational Linguistics, Jun. 2014, pp. 545–550.doi:10.3115/v1/P14-2089. [Online].Available:https://www.aclweb.org/anthology/P14-2089.
[7]C. Xu, Y. Bai, J. Bian, B. Gao, G. Wang, X. Liu, and T.-Y. Liu, “Rc-net: A generalframework for incorporating knowledge into word representations,” inProceedingsof the 23rd ACM international conference on conference on information and knowl-edge management, 2014, pp. 1219–1228.
[8]J. Bian, B. Gao, and T.-Y. Liu, “Knowledge-powered deep learning for word embed-ding,” inJoint European conference on machine learning and knowledge discoveryin databases, Springer, 2014, pp. 132–148.
[9]D. Fried and K. Duh, “Incorporating both distributional and relational semanticsin word representations,”arXiv preprint arXiv:1412.4369, 2014.
[10]E. Pavlick, P. Rastogi, J. Ganitkevitch, B. Van Durme, and C. Callison-Burch,“Ppdb 2.0: Better paraphrase ranking, fine-grained entailment relations, word em-beddings, and style classification,” inProceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th International Joint Con-ference on Natural Language Processing (Volume 2: Short Papers), 2015, pp. 425–430.
[11]R. Schwartz, R. Reichart, and A. Rappoport, “Symmetric pattern based word em-beddings for improved word similarity prediction,” inProceedings of the nineteenthconference on computational natural language learning, 2015, pp. 258–267.
[12]M. Ono, M. Miwa, and Y. Sasaki, “Word embedding-based antonym detection usingthesauri and distributional information,” inProceedings of the 2015 Conference ofthe North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2015, pp. 984–989.
[13]D. Osborne, S. Narayan, and S. B. Cohen, “Encoding prior knowledge with eigen-word embeddings,”Transactions of the Association for Computational Linguistics,vol. 4, pp. 417–430, 2016.
[14]M. Faruqui, J. Dodge, S. K. Jauhar, C. Dyer, E. Hovy, and N. A. Smith, “Retrofittingword vectors to semantic lexicons,” inProceedings of the 2015 Conference of theNorth American Chapter of the Association for Computational Linguistics: HumanLanguage Technologies, Denver, Colorado: Association for Computational Linguis-tics, May 2015, pp. 1606–1615.doi:10.3115/v1/N15-1184. [Online]. Available:https://www.aclweb.org/anthology/N15-1184.
[15]N. Mrkšić, D. Ó Séaghdha, B. Thomson, M. Gašić, L. M. Rojas-Barahona, P.-H.Su, D. Vandyke, T.-H. Wen, and S. Young, “Counter-fitting word vectors to lin-guistic constraints,” inProceedings of the 2016 Conference of the North AmericanChapter of the Association for Computational Linguistics: Human Language Tech-nologies, San Diego, California: Association for Computational Linguistics, Jun.2016, pp. 142–148.doi:10.18653/v1/N16- 1018. [Online]. Available:https://www.aclweb.org/anthology/N16-1018.
[16]J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, “From paraphrase databaseto compositional paraphrase model and back,”Transactions of the Association forComputational Linguistics, vol. 3, pp. 345–358, 2015.doi:10.1162/tacl_a_00143.[Online]. Available:https://www.aclweb.org/anthology/Q15-1025.
[17]N. Mrkšić, I. Vulić, D. Ó. Séaghdha, I. Leviant, R. Reichart, M. Gašić, A. Korho-nen, and S. Young, “Semantic specialization of distributional word vector spacesusing monolingual and cross-lingual constraints,”Transactions of the associationfor Computational Linguistics, vol. 5, pp. 309–324, 2017.
[18]J.-K. Kim, M.-C. de Marneffe, and E. Fosler-Lussier, “Adjusting word embeddingswith semantic intensity orders,” inProceedings of the 1st Workshop on Represen-tation Learning for NLP, Berlin, Germany: Association for Computational Lin-guistics, Aug. 2016, pp. 62–69.doi:10.18653/v1/W16-1607. [Online]. Available:https://www.aclweb.org/anthology/W16-1607.
[19]H. Jo and S. J. Choi, “Extrofitting: Enriching word representation and its vectorspace with semantic lexicons,” inProceedings of The Third Workshop on Repre-sentation Learning for NLP, Melbourne, Australia: Association for ComputationalLinguistics, Jul. 2018, pp. 24–29.doi:10.18653/v1/W18-3003. [Online]. Available:https://www.aclweb.org/anthology/W18-3003.
[20]A. V. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural net-works,” inProceedings of The 33rd International Conference on Machine Learning,M. F. Balcan and K. Q. Weinberger, Eds., ser. Proceedings of Machine LearningResearch, vol. 48, New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 1747–1756. [Online]. Available:http://proceedings.mlr.press/v48/oord16.html.
[21]S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation,vol. 9, no. 8, pp. 1735–1780, 1997.
[22]J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gatedrecurrent neural networks on sequence modeling,”arXiv preprint arXiv:1412.3555,2014.
[23]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,and I. Polosukhin, “Attention is all you need,” inAdvances in Neural InformationProcessing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017, pp. 5998–6008. [Online]. Available:http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
[24]Z. Yang, Z. Dai, Y. Yang, J. G. Carbonell, R. Salakhutdinov, and Q. V. Le,“Xlnet: Generalized autoregressive pretraining for language understanding,”CoRR,vol. abs/1906.08237, 2019. arXiv:1906.08237. [Online]. Available:http://arxiv.org/abs/1906.08237.
[25]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deepbidirectional transformers for language understanding,”CoRR, vol. abs/1810.04805,2018. arXiv:1810.04805. [Online]. Available:http://arxiv.org/abs/1810.04805.
[26]A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learningword vectors for sentiment analysis,” inProceedings of the 49th Annual Meetingof the Association for Computational Linguistics: Human Language Technologies,Portland, Oregon, USA: Association for Computational Linguistics, Jun. 2011,pp. 142–150. [Online]. Available:http://www.aclweb.org/anthology/P11-1015.
[27]F. Morin and Y. Bengio, “Hierarchical probabilistic neural network language model.,”inAistats, Citeseer, vol. 5, 2005, pp. 246–252.
[28]I. Yamada, A. Asai, J. Sakuma, H. Shindo, H. Takeda, Y. Takefuji, and Y. Mat-sumoto, “Wikipedia2vec: An efficient toolkit for learning and visualizing the em-beddings of words and entities from wikipedia,”arXiv preprint 1812.06280v3, 2020.
[29]P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors withsubword information,”arXiv preprint arXiv:1607.04606, 2016.
[30]E. Pavlick, P. Rastogi, J. Ganitkevitch, B. Van Durme, and C. Callison-Burch,“PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word em-beddings, and style classification,” inProceedings of the 53rd Annual Meeting ofthe Association for Computational Linguistics and the 7th International Joint Con-ference on Natural Language Processing (Volume 2: Short Papers), Beijing, China:Association for Computational Linguistics, Jul. 2015, pp. 425–430.doi:10.3115/v1/P15-2070. [Online]. Available:https://www.aclweb.org/anthology/P15-2070.
[31]S. Rajana, C. Callison-Burch, M. Apidianaki, and V. Shwartz, “Learning antonymswith paraphrases and a morphology-aware neural network,” inProceedings of the6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), Van-couver, Canada: Association for Computational Linguistics, Aug. 2017, pp. 12–21.doi:10.18653/v1/S17-1002. [Online]. Available:https://www.aclweb.org/anthology/S17-1002.
[32]G. A. Miller, “Wordnet: A lexical database for english,”Commun. ACM, vol. 38,no. 11, pp. 39–41, Nov. 1995,issn: 0001-0782.doi:10.1145/219717.219748.[Online]. Available:https://doi.org/10.1145/219717.219748.
[33]C. F. Baker, C. J. Fillmore, and J. B. Lowe, “The Berkeley FrameNet project,”in36th Annual Meeting of the Association for Computational Linguistics and 17thInternational Conference on Computational Linguistics, Volume 1, Montreal, Que-bec, Canada: Association for Computational Linguistics, Aug. 1998, pp. 86–90.doi:10.3115/980845.980860. [Online]. Available:https://www.aclweb.org/anthology/P98-1013.
[34]F. Hill, R. Reichart, and A. Korhonen, “SimLex-999: Evaluating semantic modelswith (genuine) similarity estimation,”Computational Linguistics, vol. 41, no. 4,pp. 665–695, Dec. 2015.doi:10.1162/COLI_a_00237. [Online]. Available:https://www.aclweb.org/anthology/J15-4004.
[35]D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXivpreprint arXiv:1412.6980, 2014. |