參考文獻 |
[1] A. Sankar and C.-H. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition,” IEEE Transactions on Speech Audio Processing, vol. 4, pp.190-202, 1996.
[2] A. Varga and R. Moore, “Hidden Markov Model Decomposition of Speech And Noise," in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 845-848, 1990.
[3] M. J. F. Gales and S. Young, "An Improved Approach To The Hidden Markov Model Decomposition of Speech And Noise," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 233-236, 1992.
[4] M. Q. Wang and S. J. Young, “Speech Recognition Using Hidden Markov Model Decomposition And a General Background Speech Model," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 253-256, 1992.
[5] Y. Tsao and C.-H. Lee, "An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition," IEEE Transactions on Audio, Speech and Language Processing, vol. 17, pp. 1025-1037, Jun. 2009.
[6] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.
[7] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 1979.
[8] Y.ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, no.6, pp. 1109-1121, Dec. 1984.
[9] T. Lotter and P. Vary, “Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model,” EURASIP Journal on Applied Signal Processing, vol, 2005, no. 1, pp. 1110-1126, Jan. 2005.
[10] U. Kjems and J. Jensen, “Maximum likelihood based noise covariance matrix estimation for multi-microphone speech enhancement,” in Proceedings of European Signal Processing Conference, pp. 295-299, Aug. 2012.
[11] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 29, no.2, pp. 254-272, Apr. 1981.
[12] O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communication, vol, 25, no. 1-3, pp. 133-147, Aug. 1998.
[13] M. J. F. Gales, “Maximum likelihood linear transformations for HMM-based speech recognition,” Computer Speech and Language, vol. 12, no. 2, pp. 75-98, Apr. 1998.
[14] B. -H. Juang, “Minimum classification error rate methods for speech recognition,” vol. 5, no. 3, pp. 257-265, May 1997.
[15] V. Valtchev, J. J. Odell, P.C. Woodland and S. J. Young, “MMIE training of large vocabulary recognition systems,” Speech Communication, vol. 22, no. 4, pp. 303-314, Sep. 1997.
[16] B. Li, Y. Tsao and K. C. Sim, “An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition,” INTERSPEECH, pp. 3002-3006, 2013.
[17] B. Li and K. C. Sim, “Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7408-7412. May. 2013.
[18] S. Araki, T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda and T. Nakatani ,“Exploring multi-channel features for denoising-autoencoder-based speech enhancement,” IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 116-120, Apr. 2015.
[19] Y. Xu, J. Du, L.-R. Dai and C.-H. Lee, “A Regression Approach to Speech Enhancement Based on Deep Neural Networks,” IEEE Transactions on Audio, Speech and Language Processing, vol. 23, no. 1, pp. 7-19 ,Jan. 2015.
[20] Y. Tu, J. Du, Y. Xu, L. Dai, and C.-H. Lee, “Deep neural network based speech separation for robust speech recognition,” in International Symposium on Chinese Spoken Language Processing, pp.532-536, Oct. 2014.
[21] B. Li and K. C. Sim, “An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 200-204, May. 2014.
[22] B. Li and K. C. Sim, “A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks,” IEEE Transactions on Audio, Speech and Language Processing, vol. 22, pp. 1296-1305, Aug. 2014.
[23] A. Narayanan and D. Wang, “Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training,” IEEE Transactions on Audio, Speech and Language Processing, vol. 23, pp. 92-101, Jan. 2015.
[24] L. Breiman, “Bagging Predictors,” Journal of Machine Learning, vol. 24, no. 2, pp. 123-140, Aug. 1996.
[25] R. E. Schapire, “The Strength of Weak Learnability,” Journal of Machine Learning, vol. 5, no. 2, pp. 197-227, Jun. 1990.
[26] A. Senior, “Improving DNN speaker independence with I-vector inputs,” IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 225-229, May 2014.
[27] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[28] A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech Language Processing, vol. 20, no. 1, pp. 14-22, Jan. 2012.
[29] G. Hinton, L. Deng, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Nov. 2012.
[30] ETSI, “Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithm,” ETSI standard document ES 202 050 V1.1.5, 2007
[31] E. Fosler-Lussier, “Markov Models and Hidden Markov Models: A Brief Tutorial,” International Computer Science Institute, Technical Report (TR-98-041), Dec. 1998.
[32] 王小川,語音訊號處理,三版,全華圖書,民國九十七年。
[33] N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Networks, vol. 12, no. 12, pp. 145-151, Jan. 1999.
[34] H. Larochelle, Y. Bengio, J. Louradour , P. Lamblin, “Exploring strategies for training deep neural networks,” Journal of Machine Learning Research, vol. 10, pp. 1-40, Dec. 2009.
[35] D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, “The difficulty of training deep architectures and the effect of unsupervised pretraining,” in Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics, pp. 153-160, 2009.
[36] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771-1800, Aug. 2002.
[37] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788-798, May 2010.
[38] P. Kenny, G. Boulianne, P. Ouellet and P. Dumouchel, “Joint factor analysis versus eigenchannels in speaker recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 4, pp. 1448-1460, May 2007.
[39] R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G. E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, no. 1, pp. 79-87, Spring 1991.
[40] D. Povey, S. M. Chu, B. Varadarajan, “Universal background model based speech recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 4561-4564, Mar. 2008.
[41] Y. Tsao, X. Lu, P. Dixon, T.-y. Hu, S. Matsuda, and C. Hori, "Incorporating Local Information of the Acoustic Environments to MAP-based Feature Compensation and Acoustic Model Adaptation," Computer Speech and Language, vol. 28, no. 3, pp. 709-726, May 2014.
[42] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi Speech Recognition Toolkit,” IEEE Workshop on Automatic Speech Recognition and Understanding, Dec. 2011.
[43] M. Ida and S. Nakamura, “HMM COmposition-based rapid model adaptation using a priori noise GMM adaptation evaluation on Aurora2 corpus,” International Conference on Spoken Language Processing, pp. 437-440, 2002.
[44] D. Pearce and H.-G. Hirsch, “The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” in ASR2000 Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop, Sep. 2000.
[45] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, pp. 1929–1958, 2014.
|