參考文獻 |
Reference
[1] Y. Liu and M. Lapata, “Hierarchical Transformers for Multi-Document Summarization,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5070–5081, 2019.
[2] X. Zhang, F. Wei, and M. Zhou, “HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5059–5069, 2019.
[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” in Advances in Neural Information Processing Systems, vol. 27, 2014.
[4] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, 1986.
[5] S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,”in Fuzziness Knowl.-Based Syst., vol. 06, no. 02, pp. 107–116, 1998.
[6] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[7] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches,” In Proceedings of SSST-8, p 103–111, 2014.
[8] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, 1997.
[9] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning toalign and translate,” In Proceedings of ICLR 2015, 2015.
[10] T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421, 2015.
[11] M. R. Costa-jussà and J. A. R. Fonollosa, “Character-based Neural Machine Translation,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 357–361, 2016.
[12] A. See, M.-T. Luong, and C. D. Manning, “Compression of Neural Machine Translation Models via Pruning,” in Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 291–301, 2016.
[13] Y. Kim and A. M. Rush, “Sequence-Level Knowledge Distillation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1317–1327, 2016.
[14] J. Zhou, Y. Cao, X. Wang, P. Li, and W. Xu, “Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation,” Trans. Assoc. Comput. Linguist., vol. 4, pp. 371–383, 2016.
[15] R. J. Williams and D. Zipser, “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, 1989.
[16] S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks,” in Advances in Neural Information Processing System, pp. 1171–1179, 2015.
[17] A. M. Lamb, A. G. ALIAS PARTH GOYAL, Y. Zhang, S. Zhang, A. C. Courville, and Y. Bengio, “Professor Forcing: A New Algorithm for Training Recurrent Networks,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
[18] T. Mihaylova and A. F. T. Martins, “Scheduled Sampling for Transformers,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 351–356, 2019.
[19] T. Mikolov, K. Chen, G. s Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proc. Workshop ICLR, vol. 2013, 2013.
[20] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A neural probabilistic language model,” J. Mach. Learn. Res., vol. 3, no. null, pp. 1137–1155, 2003.
[21] M. E. Peters et al., “Deep Contextualized Word Representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237, 2018.
[22] A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
[23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, 2019.
[24] A. Radford and K. Narasimhan, “Improving Language Understanding by Generative Pre-Training,” 2018.
[25] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: Enhanced Language Representation with Informative Entities,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451, 2019.
[26] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
[27] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3776–3783, 2016.
[28] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” 2nd International Conference on Learning Representations ICLR, 2014.
[29] K. Yao, G. Zweig, and B. Peng, “Attention with Intention for a Neural Network Conversation Model,” arxiv, 2015.
[30] H. Chen, Z. Ren, J. Tang, Y. E. Zhao, and D. Yin, “Hierarchical Variational Memory Network for Dialogue Generation,” in Proceedings of the 2018 World Wide Web Conference, pp. 1653–1662, 2018.
[31] Y. Liu, H. Yuan, and S. Ji, “Learning Local and Global Multi-context Representations for Document Classification,” in 2019 IEEE International Conference on Data Mining (ICDM), pp. 1234–1239, 2019.
[32] Y. Li, J. Yu, and Z. Wang, “Dense Semantic Matching Network for Multi-turn Conversation,” in 2019 IEEE International Conference on Data Mining (ICDM), pp. 1186–1191, 2019.
[33] J. Li, T. Luong, and D. Jurafsky, “A Hierarchical Neural Autoencoder for Paragraphs and Documents,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1106–1115, 2015.
[34] I. Serban et al., “A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues,” Proc. AAAI Conf. Artif. Intell., vol. 31, no. 1, Art. no. 1, 2017.
[35] J. Xu, H. Wang, Z.-Y. Niu, H. Wu, W. Che, and T. Liu, “Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1835–1845, 2020.
[36] S. Wu, Y. Li, D. Zhang, Y. Zhou, and Z. Wu, Diverse and Informative Dialogue Generation with Context-Specific Commonsense Knowledge Awareness,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5811–5820, 2020.
[37] V. Vlasov, J. Mosig, and A. Nichol, “Dialogue Transformers,” CoRR, 2019.
[38] W. Chen, J. Chen, P. Qin, X. Yan, and W. Y. Wang, “Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3696–3709, 2019.
[39] B. Santra, P. Anusha, and P. Goyal, “Hierarchical Transformer for Task Oriented Dialog Systems,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5649–5658, 2021.
[40] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, 2002.
[41] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, pp. 74–81, 2004.
[42] Y. Li, H. Su, X. Shen, W. Li, Z. Cao, and S. Niu, “DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset,” in Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 986–995, 2017.
[43] X. Zang, A. Rastogi, S. Sunkara, R. Gupta, J. Zhang, and J. Chen, “MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines,” in Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pp. 109–117, 2020.
[44] B. Chen and C. Cherry, “A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU,” in Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 362–367, 2014. |