參考文獻 |
Aiken, M. (2019). An Updated Evaluation of Google Translate Accuracy. Studies in Linguistics and Literature, 3, p253. https://doi.org/10.22158/sll.v3n3p253
Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer Normalization. ArXiv:1607.06450 [Cs, Stat]. http://arxiv.org/abs/1607.06450
Baan, J., ter Hoeve, M., van der Wees, M., Schuth, A., & de Rijke, M. (2019). Do Transformer Attention Heads Provide Transparency in Abstractive Summarization? ArXiv:1907.00570 [Cs]. http://arxiv.org/abs/1907.00570
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473 [Cs, Stat]. http://arxiv.org/abs/1409.0473
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. ArXiv:1607.04606 [Cs]. http://arxiv.org/abs/1607.04606
Chen, B., & Cherry, C. (2014). A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU. Proceedings of the Ninth Workshop on Statistical Machine Translation, 362–367. https://doi.org/10.3115/v1/W14-3346
Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., Jones, L., Parmar, N., Schuster, M., Chen, Z., Wu, Y., & Hughes, M. (2018). The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. ArXiv:1804.09849 [Cs]. http://arxiv.org/abs/1804.09849
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv:1406.1078 [Cs, Stat]. http://arxiv.org/abs/1406.1078
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. ArXiv:1901.02860 [Cs, Stat]. http://arxiv.org/abs/1901.02860
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1207/s15516709cog1402_1
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. Proceedings of the 34th International Conference on Machine Learning - Volume 70, 1243–1252.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 315–323. http://proceedings.mlr.press/v15/glorot11a.html
Hao, J., Wang, X., Shi, S., Zhang, J., & Tu, Z. (2019). Multi-Granularity Self-Attention for Neural Machine Translation. ArXiv:1909.02222 [Cs]. http://arxiv.org/abs/1909.02222
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Huang, K., Altosaar, J., & Ranganath, R. (2019). ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. ArXiv:1904.05342 [Cs]. http://arxiv.org/abs/1904.05342
Iida, S., Kimura, R., Cui, H., Hung, P.-H., Utsuro, T., & Nagata, M. (2019). Attention over Heads: A Multi-Hop Attention for Neural Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 217–222. https://doi.org/10.18653/v1/P19-2030
Li, J., Tu, Z., Yang, B., Lyu, M. R., & Zhang, T. (2018). Multi-Head Attention with Disagreement Regularization. ArXiv:1810.10183 [Cs]. http://arxiv.org/abs/1810.10183
Liu, Y. (2019). Fine-tune BERT for Extractive Summarization. ArXiv:1903.10318 [Cs]. http://arxiv.org/abs/1903.10318
Medina, J. R., & Kalita, J. (2018). Parallel Attention Mechanisms in Neural Machine Translation. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 547–552. https://doi.org/10.1109/ICMLA.2018.00088
Michel, P., Levy, O., & Neubig, G. (2019). Are Sixteen Heads Really Better than One? In H. Wallach, H. Larochelle, A. Beygelzimer, F. d extquotesingle Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32 (pp. 14014–14024). Curran Associates, Inc. http://papers.nips.cc/paper/9551-are-sixteen-heads-really-better-than-one.pdf
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 3111–3119.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318. https://doi.org/10.3115/1073083.1073135
Rush, A. (2018). The Annotated Transformer. Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 52–60. https://doi.org/10.18653/v1/W18-2509
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725. https://doi.org/10.18653/v1/P16-1162
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 3104–3112.
Tiedemann, J. (2012). Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2214–2218. http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
Vig, J. (2019). Visualizing Attention in Transformer-Based Language Representation Models. ArXiv:1904.02679 [Cs, Stat]. http://arxiv.org/abs/1904.02679
Voita, E., Talbot, D., Moiseev, F., Sennrich, R., & Titov, I. (2019). Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5797–5808. https://doi.org/10.18653/v1/P19-1580
Vosoughi, S., Vijayaraghavan, P., & Roy, D. (2016). Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’16, 1041–1044. https://doi.org/10.1145/2911451.2914762
Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S. (2018). CBAM: Convolutional Block Attention Module. ArXiv:1807.06521 [Cs]. http://arxiv.org/abs/1807.06521
Wu, F., Fan, A., Baevski, A., Dauphin, Y. N., & Auli, M. (2019). Pay Less Attention with Lightweight and Dynamic Convolutions. ArXiv:1901.10430 [Cs]. http://arxiv.org/abs/1901.10430
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., … Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv:1609.08144 [Cs]. http://arxiv.org/abs/1609.08144
Yang, B., Tu, Z., Wong, D. F., Meng, F., Chao, L. S., & Zhang, T. (2018). Modeling Localness for Self-Attention Networks. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4449–4458. https://doi.org/10.18653/v1/D18-1475 |