參考文獻 |
[1] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, 1989, doi:
10.1109/5.18626.
[2] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to-end
attention-based large vocabulary speech recognition,” in ICASSP, IEEE
International Conference on Acoustics, Speech and Signal Processing -
Proceedings, Institute of Electrical and Electronics Engineers Inc., May 2016, pp.
4945–4949. doi: 10.1109/ICASSP.2016.7472618.
[3] S. Watanabe et al., “ESPNet: End-to-end speech processing toolkit,” in Proceedings
of the Annual Conference of the International Speech Communication Association,
INTERSPEECH, 2018. doi: 10.21437/Interspeech.2018-1456.
[4] G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition:
The shared views of four research groups,” IEEE Signal Process Mag, vol. 29, no.
6, pp. 82–97, 2012, doi: 10.1109/MSP.2012.2205597.
[5] A. Graves, A. R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent
neural networks,” in ICASSP, IEEE International Conference on Acoustics, Speech
and Signal Processing - Proceedings, 2013. doi: 10.1109/ICASSP.2013.6638947.
[6] A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information
Processing Systems, 2017.
[7] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Gated Recurrent Neural Networks
on Sequence Modeling,” ArXiv, 2014.
[8] M. T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased
neural machine translation,” in Conference Proceedings - EMNLP 2015:
Conference on Empirical Methods in Natural Language Processing, 2015. doi:
10.18653/v1/d15-1166.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, 2016. doi: 10.1109/CVPR.2016.90.
[10] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” in 32nd International Conference on Machine
Learning, ICML 2015, 2015.
[11] J. L. Ba, J. R. Kiros, and G. E. Hinton, “(LN) Layer Norm,” arXiv:1607.06450v1,
2015.
[12] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for
statistical machine translation,” in EMNLP 2014 - 2014 Conference on Empirical
Methods in Natural Language Processing, Proceedings of the Conference,
Association for Computational Linguistics (ACL), 2014, pp. 1724–1734. doi:
10.3115/v1/d14-1179.
[13] D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation by jointly
learning to align and translate,” in 3rd International Conference on Learning
Representations, ICLR 2015 - Conference Track Proceedings, 2015.
[14] M. Johnson et al., “Google’s Multilingual Neural Machine Translation System:
Enabling Zero-Shot Translation,” Trans Assoc Comput Linguist, vol. 5, 2017, doi:
10.1162/tacl_a_00065.
[15] R. Jozefowicz, O. Vinyals, N. Shazeer, and YonghuiWu, “Exploring the Limits of
Language Modeling arXiv:1602.02410v2,” arXiv preprint arXiv:, 2016.
[16] M. A. Hedderich, L. Lange, H. Adel, J. Strötgen, and D. Klakow, “A Survey on
Recent Approaches for Natural Language Processing in Low-Resource Scenarios,”
in NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies,
Proceedings of the Conference, 2021. doi: 10.18653/v1/2021.naacl-main.201.
[17] C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text
transformer,” Journal of Machine Learning Research, vol. 21, Jun. 2020.
[18] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on
Knowledge and Data Engineering, vol. 22, no. 10. pp. 1345–1359, 2010. doi:
10.1109/TKDE.2009.191.
[19] R. Collobert and J. Weston, “A unified architecture for natural language processing:
Deep neural networks with multitask learning,” in Proceedings of the 25th
International Conference on Machine Learning, 2008, pp. 160–167.
[20] J. E. van Engelen and H. H. Hoos, “A survey on semi-supervised learning,” Mach
Learn, vol. 109, no. 2, pp. 373–440, Feb. 2020, doi: 10.1007/s10994-019-05855-6.
[21] Y. Chen and T. Augustinova, “Are Language-Agnostic Sentence Representations
actually Language-Agnostic?,” in International Conference Recent Advances in
Natural Language Processing, RANLP, Incoma Ltd, 2021, pp. 274–280. doi:
10.26615/978-954-452-072-4_032.
[22] M. Artetxe and H. Schwenk, “Massively Multilingual Sentence Embeddings for
Zero-Shot Cross-Lingual Transfer and Beyond,” Trans Assoc Comput Linguist, vol.
7, pp. 597–610, Sep. 2019, doi: 10.1162/tacl_a_00288.
[23] S. Ruder, I. Vulić, and A. Søgaard, “A survey of cross-lingual word embedding
models,” Journal of Artificial Intelligence Research, vol. 65. AI Access Foundation,
pp. 569–631, 2019. doi: 10.1613/JAIR.1.11640.
[24] R. Dabre, C. Chu, and A. Kunchukuttan, “A Survey of Multilingual Neural Machine
Translation,” ACM Comput Surv, vol. 53, no. 5, Sep. 2020, doi: 10.1145/3406095.
[25] Y. Wang et al., “PromDA: Prompt-based Data Augmentation for Low-Resource
NLU Tasks,” in Proceedings of the Annual Meeting of the Association for
Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-long.292.
[26] J. Wei and K. Zou, “EDA: Easy data augmentation techniques for boosting
performance on text classification tasks,” in EMNLP-IJCNLP 2019 - 2019
Conference on Empirical Methods in Natural Language Processing and 9th
International Joint Conference on Natural Language Processing, Proceedings of
the Conference, Association for Computational Linguistics, 2019, pp. 6382–6388.
doi: 10.18653/v1/d19-1670.
[27] D. Li et al., “Contextualized Perturbation for Textual Adversarial Attack,” in
NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies,Proceedings of the Conference, Association for Computational Linguistics (ACL),
2021, pp. 5053–5069. doi: 10.18653/v1/2021.naacl-main.400.
[28] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep
bidirectional transformers for language understanding,” in NAACL HLT 2019 - 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies - Proceedings of the Conference, 2019.
[29] T. Pires, E. Schlinger, and D. Garrette, “How multilingual is multilingual BERT?,”
in ACL 2019 - 57th Annual Meeting of the Association for Computational
Linguistics, Proceedings of the Conference, 2020. doi: 10.18653/v1/p19-1493.
[30] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language
Understanding by Generative Pre-Training,” OpenAI.com, 2018.
[31] X. Wu and M. Lode, “Language Models are Unsupervised Multitask Learners
( Summarization ),” OpenAI Blog, vol. 1, no. May, 2020.
[32] T. B. Brown et al., “Language models are few-shot learners,” in Advances in Neural
Information Processing Systems, 2020.
[33] R. Morgan and R. Garigl, Hugging |