參考文獻 |
[1] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989, doi: 10.1109/5.18626.
[2] Y. Li and P. Fung, “Code-Switch Language Model with Inversion Constraints for Mixed Language Speech Recognition,” in Proceedings of COLING 2012, Mumbai, India, Dec. 2012, pp. 1671–1680. Accessed: Apr. 21, 2022. [Online]. Available: https://aclanthology.org/C12-1102
[3] Y. Li and P. Fung, “Language Modeling with Functional Head Constraint for Code Switching Speech Recognition,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 907–916. doi: 10.3115/v1/D14-1098.
[4] H. Adel, N. T. Vu, F. Kraus, T. Schlippe, H. Li, and T. Schultz, “Recurrent neural network language modeling for code switching conversational speech,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 8411–8415. doi: 10.1109/ICASSP.2013.6639306.
[5] H. Adel, N. T. Vu, and T. Schultz, “Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria, Aug. 2013, pp. 206–211. Accessed: Apr. 21, 2022. [Online]. Available: https://aclanthology.org/P13-2037
[6] G. I. Winata, A. Madotto, C.-S. Wu, and P. Fung, “Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning,” in Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, Melbourne, Australia, Jul. 2018, pp. 62–67. doi: 10.18653/v1/W18-3207.
[7] M. Choudhury, K. Bali, S. Sitaram, and A. Baheti, “Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks,” in Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), Kolkata, India, Dec. 2017, pp. 65–74. Accessed: Apr. 21, 2022. [Online]. Available: https://aclanthology.org/W17-7509
[8] A. Pratapa, G. Bhat, M. Choudhury, S. Sitaram, S. Dandapat, and K. Bali, “Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, Jul. 2018, pp. 1543–1553. doi: 10.18653/v1/P18-1143.
[9] S. Garg, T. Parekh, and P. Jyothi, “Code-switched Language Models Using Dual RNNs and Same-Source Pretraining,” ArXiv180901962 Cs, Sep. 2018, Accessed: Apr. 21, 2022. [Online]. Available: http://arxiv.org/abs/1809.01962
[10] G. I. Winata, A. Madotto, C.-S. Wu, and P. Fung, “Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences,” ArXiv190908582 Cs, Sep. 2019, Accessed: Apr. 21, 2022. [Online]. Available: http://arxiv.org/abs/1909.08582
[11] A. Vaswani et al., “Attention Is All You Need,” ArXiv170603762 Cs, Dec. 2017, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1706.03762
[12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” ArXiv181004805 Cs, May 2019, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1810.04805
[13] A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, “An analysis of incorporating an external language model into a sequence-to-sequence model,” ArXiv171201996 Cs Eess, Dec. 2017, Accessed: Apr. 21, 2022. [Online]. Available: http://arxiv.org/abs/1712.01996
[14] C. Gulcehre et al., “On Using Monolingual Corpora in Neural Machine Translation,” ArXiv150303535 Cs, Jun. 2015, Accessed: Apr. 21, 2022. [Online]. Available: http://arxiv.org/abs/1503.03535
[15] A. Sriram, H. Jun, S. Satheesh, and A. Coates, “Cold Fusion: Training Seq2Seq Models Together with Language Models,” ArXiv170806426 Cs, Aug. 2017, Accessed: Apr. 21, 2022. [Online]. Available: http://arxiv.org/abs/1708.06426
[16] J. L. Elman, “Learning and development in neural networks: the importance of starting small,” Cognition, vol. 48, no. 1, pp. 71–99, Jul. 1993, doi: 10.1016/0010-0277(93)90058-4.
[17] L. Yu, W. Zhang, J. Wang, and Y. Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient,” ArXiv160905473 Cs, Aug. 2017, Accessed: Apr. 21, 2022. [Online]. Available: http://arxiv.org/abs/1609.05473
[18] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[19] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” ArXiv14123555 Cs, Dec. 2014, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1412.3555
[20] Y. Wu et al., “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” ArXiv160908144 Cs, Oct. 2016, Accessed: Apr. 24, 2022. [Online]. Available: http://arxiv.org/abs/1609.08144
[21] M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” ArXiv150804025 Cs, Sep. 2015, Accessed: Apr. 24, 2022. [Online]. Available: http://arxiv.org/abs/1508.04025
[22] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu, “Exploring the Limits of Language Modeling,” ArXiv160202410 Cs, Feb. 2016, Accessed: Apr. 24, 2022. [Online]. Available: http://arxiv.org/abs/1602.02410
[23] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” ArXiv14090473 Cs Stat, May 2016, Accessed: Apr. 24, 2022. [Online]. Available: http://arxiv.org/abs/1409.0473
[24] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional Sequence to Sequence Learning,” ArXiv170503122 Cs, Jul. 2017, Accessed: Apr. 24, 2022. [Online]. Available: http://arxiv.org/abs/1705.03122
[25] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” ArXiv13013781 Cs, Sep. 2013, Accessed: Apr. 25, 2022. [Online]. Available: http://arxiv.org/abs/1301.3781
[26] M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, “Semi-supervised sequence tagging with bidirectional language models,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, Jul. 2017, pp. 1756–1765. doi: 10.18653/v1/P17-1161.
[27] M. E. Peters et al., “Deep contextualized word representations,” ArXiv180205365 Cs, Mar. 2018, Accessed: Apr. 26, 2022. [Online]. Available: http://arxiv.org/abs/1802.05365
[28] O. Melamud, J. Goldberger, and I. Dagan, “context2vec: Learning Generic Context Embedding with Bidirectional LSTM,” in Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, Aug. 2016, pp. 51–61. doi: 10.18653/v1/K16-1006.
[29] A. M. Dai and Q. V. Le, “Semi-supervised Sequence Learning,” ArXiv151101432 Cs, Nov. 2015, Accessed: Apr. 26, 2022. [Online]. Available: http://arxiv.org/abs/1511.01432
[30] J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” ArXiv180106146 Cs Stat, May 2018, Accessed: Apr. 26, 2022. [Online]. Available: http://arxiv.org/abs/1801.06146
[31] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” p. 12.
[32] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, “Listen, Attend and Spell,” ArXiv150801211 Cs Stat, Aug. 2015, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1508.01211
[33] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to-End Attention-based Large Vocabulary Speech Recognition,” ArXiv150804395 Cs, Mar. 2016, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1508.04395
[34] A. Graves, “Sequence Transduction with Recurrent Neural Networks,” ArXiv12113711 Cs Stat, Nov. 2012, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1211.3711
[35] A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, “Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,” p. 8.
[36] Z. Yang, B. Dhingra, Y. Yuan, J. Hu, W. W. Cohen, and R. Salakhutdinov, “Words or Characters? Fine-grained Gating for Reading Comprehension,” ArXiv161101724 Cs, Sep. 2017, Accessed: Apr. 22, 2022. [Online]. Available: http://arxiv.org/abs/1611.01724
[37] R. Iyer, M. Ostendorf, and H. Gish, “Using out-of-domain data to improve in-domain language models,” IEEE Signal Process. Lett., vol. 4, no. 8, pp. 221–223, Aug. 1997, doi: 10.1109/97.611282.
[38] S. R. Gangireddy, P. Swietojanski, P. Bell, and S. Renals, “Unsupervised Adaptation of Recurrent Neural Network Language Models,” in Interspeech 2016, Sep. 2016, pp. 2333–2337. doi: 10.21437/Interspeech.2016-1342.
[39] M. Ma, M. Nirschl, F. Biadsy, and S. Kumar, “Approaches for Neural-Network Language Model Adaptation,” Aug. 2017, pp. 259–263. doi: 10.21437/Interspeech.2017-1310.
[40] J. Salazar, D. Liang, T. Q. Nguyen, and K. Kirchhoff, “Masked Language Model Scoring,” Proc. 58th Annu. Meet. Assoc. Comput. Linguist., pp. 2699–2712, 2020, doi: 10.18653/v1/2020.acl-main.240.
[41] J. Shin, Y. Lee, and K. Jung, “Effective Sentence Scoring Method using Bidirectional Language Model for Speech Recognition,” ArXiv190506655 Cs Eess, May 2019, Accessed: Apr. 23, 2022. [Online]. Available: http://arxiv.org/abs/1905.06655
[42] S.-H. Chiu and B. Chen, “Innovative Bert-based Reranking Language Models for Speech Recognition,” 2021 IEEE Spok. Lang. Technol. Workshop SLT, pp. 266–271, Jan. 2021, doi: 10.1109/SLT48900.2021.9383557.
[43] K. Li et al., “An Empirical Study of Transformer-Based Neural Language Model Adaptation,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2020, pp. 7934–7938. doi: 10.1109/ICASSP40776.2020.9053399.
[44] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model,” J. Mach. Learn. Res., vol. 3, no. Feb, pp. 1137–1155, 2003.
[45] H. Schwenk, “Continuous space language models,” Comput. Speech Lang., vol. 21, no. 3, Jul. 2007.
[46] J. Park, X. Liu, M. J. F. Gales, and P. Woodland, “Improved neural network based language modelling and adaptation,” Sep. 2010, pp. 1041–1044. doi: 10.21437/Interspeech.2010-342.
[47] H.-S. Le, I. Oparin, A. Allauzen, J.-L. Gauvain, and F. Yvon, “Structured Output Layer Neural Network Language Models for Speech Recognition,” IEEE Trans. Audio Speech Lang. Process., vol. 21, no. 1, pp. 197–206, Jan. 2013, doi: 10.1109/TASL.2012.2215599.
[48] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur, “Recurrent neural network based language model,” Jan. 2010, vol. 2, pp. 1045–1048.
[49] T. Mikolov, S. Kombrink, L. Burget, J. Černocký, and S. Khudanpur, “Extensions of recurrent neural network language model,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 5528–5531. doi: 10.1109/ICASSP.2011.5947611.
[50] M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM Neural Networks for Language Modeling,” 2012.
[51] S. Merity, N. S. Keskar, and R. Socher, “Regularizing and Optimizing LSTM Language Models,” ArXiv170802182 Cs, Aug. 2017, Accessed: Apr. 23, 2022. [Online]. Available: http://arxiv.org/abs/1708.02182
[52] M. Sundermeyer, I. Oparin, J.-L. Gauvain, B. Freiberg, R. Schlüter, and H. Ney, “Comparison of feedforward and recurrent neural network language models,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 8430–8434. doi: 10.1109/ICASSP.2013.6639310.
[53] A. Graves, G. Wayne, and I. Danihelka, “Neural Turing Machines,” ArXiv14105401 Cs, Dec. 2014, Accessed: Apr. 23, 2022. [Online]. Available: http://arxiv.org/abs/1410.5401
[54] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” ArXiv14090473 Cs Stat, May 2016, Accessed: Apr. 23, 2022. [Online]. Available: http://arxiv.org/abs/1409.0473
[55] K. Irie, A. Zeyer, R. Schlüter, and H. Ney, “Language Modeling with Deep Transformers,” Interspeech 2019, pp. 3905–3909, Sep. 2019, doi: 10.21437/Interspeech.2019-2225.
[56] Y. Shi, M. Larson, and C. Jonker, “Exploiting the succeeding words in recurrent neural network language models,” Aug. 2013, pp. 632–636. doi: 10.21437/Interspeech.2013-183.
[57] E. Arisoy, A. Sethy, B. Ramabhadran, and S. Chen, “Bidirectional recurrent neural network language models for automatic speech recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2015, pp. 5421–5425. doi: 10.1109/ICASSP.2015.7179007.
[58] X. Chen, A. Ragni, X. Liu, and M. J. F. Gales, “Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition,” Aug. 2017, pp. 269–273. doi: 10.21437/Interspeech.2017-513.
[59] A. Wang and K. Cho, “BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model,” ArXiv190204094 Cs, Apr. 2019, Accessed: Apr. 29, 2022. [Online]. Available: http://arxiv.org/abs/1902.04094
[60] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” ArXiv200611477 Cs Eess, Oct. 2020, Accessed: May 06, 2022. [Online]. Available: http://arxiv.org/abs/2006.11477
[61] A. van den Oord, Y. Li, and O. Vinyals, “Representation Learning with Contrastive Predictive Coding,” ArXiv180703748 Cs Stat, Jan. 2019, Accessed: May 06, 2022. [Online]. Available: http://arxiv.org/abs/1807.03748
[62] G. Lample and A. Conneau, “Cross-lingual Language Model Pretraining,” ArXiv190107291 Cs, Jan. 2019, Accessed: May 06, 2022. [Online]. Available: http://arxiv.org/abs/1901.07291
[63] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” ArXiv190711692 Cs, Jul. 2019, Accessed: May 06, 2022. [Online]. Available: http://arxiv.org/abs/1907.11692
[64] J. Besag, “Statistical Analysis of Non-Lattice Data,” J. R. Stat. Soc. Ser. Stat., vol. 24, no. 3, pp. 179–195, 1975, doi: 10.2307/2987782.
[65] D.-C. Lyu, T.-P. Tan, E. Chng, and H. Li, “Mandarin–English code-switching speech corpus in South-East Asia: SEAME,” Jan. 2010, vol. 49, pp. 1986–1989. doi: 10.1007/s10579-015-9303-x.
[66] V. Pratap et al., “wav2letter++: The Fastest Open-source Speech Recognition System,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 6460–6464. doi: 10.1109/ICASSP.2019.8683535.
[67] M. Ott et al., “fairseq: A Fast, Extensible Toolkit for Sequence Modeling,” arXiv, arXiv:1904.01038, Apr. 2019. doi: 10.48550/arXiv.1904.01038.
[68] J. Kahn et al., “Libri-Light: A Benchmark for ASR with Limited or No Supervision,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2020, pp. 7669–7673. doi: 10.1109/ICASSP40776.2020.9052942.
[69] T. Wolf et al., “HuggingFace’s Transformers: State-of-the-art Natural Language Processing,” arXiv, arXiv:1910.03771, Jul. 2020. doi: 10.48550/arXiv.1910.03771. |