參考文獻 |
[1] Hinton, Geoffrey, et al. ”Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.” IEEE Signal Processing Magazine 29.6 (2012): 82-97.
[2] Dahl, George E., et al. ”Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition.” IEEE Transactions on Audio, Speech, and Language Processing 20.1 (2012): 30-42.
[3] Deng, Li, Geoffrey Hinton, and Brian Kingsbury. ”New types of deep neural network learning for speech recognition and related applications: An overview.” Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
[4] Grais, Emad M., Mehmet Umut Sen, and Hakan Erdogan. ”Deep neural networks for single channel source separation.” Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[5] Lei, Yun, et al. ”A novel scheme for speaker recognition using a phonetically-aware deep neural network.” Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[6] Yamada, Takanori, Longbiao Wang, and Atsuhiko Kai. ”Improvement of distant-talking speaker identification using bottleneck features of DNN.” Interspeech. 2013.
[7] Han, Kun, Dong Yu, and Ivan Tashev. ”Speech emotion recognition using deep neural network and extreme learning machine.” Interspeech. 2014.
[8] Seide, Frank, Gang Li, and Dong Yu. ”Conversational Speech Transcription Using Context-Dependent Deep Neural Networks.” Interspeech. 2011.
[9] Anguera, Xavier, Chuck Wooters, and Javier Hernando. ”Acoustic beamforming for speaker diarization of meetings.” IEEE Transactions on Audio, Speech, and Language Processing 15.7 (2007): 2011-2022.
[10] Heymann, Jahn, et al. ”BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge.” Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015.
[11] Han, Wei, et al. ”An efficient MFCC extraction method in speech recognition.” Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on. IEEE, 2006.
[12] Sainath, Tara N., et al. ”Deep convolutional neural networks for LVCSR.” Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. IEEE, 2013.
[13] Sak, Haşim, Andrew Senior, and Françoise Beaufays. ”Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition.” arXiv preprint arXiv:1402.1128 (2014).
[14] Liu, Xunying, et al. ”Efficient lattice rescoring using recurrent neural network language models.” Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
[15] Du, Jun, et al. ”The USTC–iFlytek System for CHiME-4 Challenge.” Proc. CHiME (2016): 36-38.
[16] Chen, Guoguo, Carolina Parada, and Tara N. Sainath. ”Query-by-example keyword spotting using long short-term memory networks.” Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
[17] Fengpei Ge, and Yonghong Yan. ”Deep Neural Network Based Wake-Up-Word Speech Recognition with Two-Stage Detection” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
[18] Molau, Sirko, et al. ”Computing mel-frequency cepstral coefficients on the power spectrum.” Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP′01). 2001 IEEE International Conference on. Vol. 1. IEEE, 2001.
[19] Prasad, N. Vishnu, and Srinivasan Umesh. ”Improved cepstral mean and variance normalization using Bayesian framework.” Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.
[20] Rath, Shakti P., et al. ”Improved feature processing for deep neural networks.” Interspeech. 2013.
[21] Belhumeur, Peter N., João P. Hespanha, and David J. Kriegman. ”Eigenfaces vs. fisherfaces: Recognition using class specific linear projection.” IEEE Transactions on pattern analysis and machine intelligence 19.7 (1997): 711-720.
[22] Gales, Mark JF. ”Maximum likelihood linear transformations for HMM-based speech recognition.” Computer speech & language 12.2 (1998): 75-98.
[23] Van Veen, Barry D., and Kevin M. Buckley. ”Beamforming: A versatile approach to spatial filtering.” IEEE ASSP magazine 5.2 (1988): 4-24.
[24] Povey, Daniel, and George Saon. ”Feature and model space speaker adaptation with full covariance Gaussians.” INTERSPEECH. 2006.
[25] Ghahramani, Zoubin. ”An introduction to hidden Markov models and Bayesian networks.” International journal of pattern recognition and artificial intelligence 15.01 (2001): 9-42.
[26] Mohamed, Abdel-rahman, George E. Dahl, and Geoffrey Hinton. ”Acoustic modeling using deep belief networks.” IEEE Transactions on Audio, Speech, and Language Processing 20.1 (2012): 14-22.
[27] Veselý, Karel, et al. ”Sequence-discriminative training of deep neural networks.” Interspeech. 2013.
[28] Chan, William, and Ian Lane. ”Deep recurrent neural networks for acoustic modelling.” arXiv preprint arXiv:1504.01482 (2015).
[29] Sak, Haşim, et al. ”Fast and accurate recurrent neural network acoustic models for speech recognition.” arXiv preprint arXiv:1507.06947 (2015).
[30] Pascanu, Razvan, et al. ”How to construct deep recurrent neural networks.” arXiv preprint arXiv:1312.6026 (2013).
[31] Graves, Alex, Navdeep Jaitly, and Abdel-rahman Mohamed. ”Hybrid speech recognition with deep bidirectional LSTM.” Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.
[32] Zeyer, Albert, et al. ”A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition.” Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
[33] Greff, Klaus, et al. ”LSTM: A search space odyssey.” IEEE transactions on neural networks and learning systems (2016).
[34] Peddinti, Vijayaditya, Daniel Povey, and Sanjeev Khudanpur. ”A time delay neural network architecture for efficient modeling of long temporal contexts.” INTERSPEECH. 2015.
[35] Brown, Peter F., et al. ”Class-based n-gram models of natural language.” Computational linguistics 18.4 (1992): 467-479.
[36] Gale, William A., and Geoffrey Sampson. ”Good‐turing frequency estimation without tears.” Journal of Quantitative Linguistics 2.3 (1995): 217-237.
[37] Kneser, Reinhard, and Hermann Ney. ”Improved backing-off for m-gram language modeling.” Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on. Vol. 1. IEEE, 1995.
[38] Sundermeyer, Martin, Ralf Schlüter, and Hermann Ney. ”LSTM Neural Networks for Language Modeling.” Interspeech. 2012.
[39] Mikolov, Tomáš, et al. ”Extensions of recurrent neural network language model.” Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011.
[40] Mikolov, Tomas, et al. ”Recurrent neural network based language model.” Interspeech. Vol. 2. 2010.
[41] Jeon, Euisok Chung Hyung-Bae Jeon, Gue Park, and Yun-Keun Lee. ”Lattice rescoring for speech recognition using large scale distributed language models.” 24th International Conference on Computational Linguistics. 2012.
[42] Barker, Jon, et al. ”The third ‘CHiME’speech separation and recognition challenge: Dataset, task and baselines.” Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015.
[43] Povey, Daniel, et al. ”The Kaldi speech recognition toolkit.” IEEE 2011 workshop on automatic speech recognition and understanding. No. EPFL-CONF-192584. IEEE Signal Processing Society, 2011.
[44] Stolcke, Andreas. ”SRILM-an extensible language modeling toolkit.” Interspeech. Vol. 2002. 2002.
[45] Enarvi, Seppo, and Mikko Kurimo. ”TheanoLM-An Extensible Toolkit for Neural Network Language Modeling.” arXiv preprint arXiv:1605.00942 (2016).
[46] Vincent, Emmanuel, et al. ”An analysis of environment, microphone and data simulation mismatches in robust speech recognition.” Computer Speech & Language (2016).
[47] Menne, Tobias, et al. ”The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation.” The 4th International Workshop on Speech Processing in Everyday Environments, San Francisco, CA, USA. 2016.
[48] Kaldi Toolkit Documentation. http://kaldi-asr.org/doc, last accessed on June 2017.
[49] Chang, Chih-Chung, and Chih-Jen Lin. ”LIBSVM: a library for support vector machines.” ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27 |