參考文獻 |
[1] G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, 29, no.6, 82-97, 2012.
[2] S. M. Siniscalchi, T. Sveendsen, C.-H. Lee, 2014. An artificial neural network approach to automatic speech processing. Neurocomputing, pp.326-338.
[3] J. Li, L. Deng, R. Haeb-Umbach and Y. Gong, Robust automatic speech recognition: a bridge to practical applications, Academic Press, 2015.
[4] A. Narayanan, D. Wang, “Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 92–101, 2015.
[5] Y. Tsao, C.-H. Lee, 2009. An ensemble speaker and speaking environment modeling approach to robust speech recognition. IEEE Trans. Audio, Speech, Lang. Process. 17 (5), 1025–1037.
[6] S. M. Siniscalchi, J. Li, C. H. Lee, 2013. Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2152-2161.
[7] T. Tan, Y. Qian, and K. Yu, 2016. Cluster adaptive training for deep neural network based acoustic model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(3), 459-468.
[8] R.P. Lippmann, E.A. Martin and D.B. Paul, “Multi-style training for robust isolated-word speech recognition,” In: Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1987, pp. 705-708.
[9] M.L. Seltzer, D. Yu and Y. Wang, “An investigation of deep neural networks for noise robust speech recognition,” In: Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 7398-7402.
[10] S. Yin, C. Liu, Z. Zhang, Y. Lin, D. Wang, J. Tejedor, T.F. Zheng, and Y. Li, “Noisy training for deep neural networks in speech recognition,” EURASIP Journal on Audio, Speech, and Music Processing, 2015, pp. 1-14.
[11] C. Weng, D. Yu, M.L. Seltzer and J. Droppo, “Deep neural networks for single channel multi-talker speech recognition,” IEEE/ACM Trans. Audio Speech Lang. Process., vol 23, no.10, pp. 1670–1679, 2015.
[12] J. Li, D. Yu, J. T. Huang and Y. Gong, “Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM.” In Proc. SLT, 2012, pp.131-136.
[13] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression approach to speech enhancement based on deep neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 7–19, 2015
[14] Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study on speech enhancement based on deep neural networks,” Signal Processing Letters, vol. 21, no. 1, pp. 65–68, 2014.
[15] X.-G. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp. 436–440.
[16] S.-S. Wang, H.-T. Hwang, Y.-H. Lai, Y. Tsao, X. Lu, H.-M. Wang, and B. Su, “Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm,” in Proc. APSIPA, 2015, pp. 365–369.
[17] J. Tang, C. Deng and G. Huang, “Extreme learning machine for multilayer perceptron,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 4, pp. 809-821, Apr. 2016.
[18] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006.
[19] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012.
[20] Y. L. He, Z.Q. Geng, Y. Xu, Q.X. Zhu, “A hierarchical structure of extreme learning machine (HELM) for high-dimensional datasets with noise,” Neurocomputing 128, 407–414, 2014.
[21] P. Scalart and J.V. Filho, “Speech enhancement based on a priori signal to noise estimation,” In: Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1996, pp. 629–632.
[22] S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust. Speech Signal Process., vol. 27, pp. 113–120, 1979.
[23] Y. Lu and P.C. Loizou, “A geometric approach to spectral subtraction.” Speech Communication, vol. 50, pp. 453–466, 2008.
[24] J. Li, S. Sakamoto, S. Hongo, M. Akagi and Y. Suzuki, “Adaptive border generalized spectral subtraction for speech enhancement,” Signal Process., vol. 88, no.11, pp. 2764–2776, 2008.
[25] U. Mittal and N. Phamdo, “Signal/noise KLT based ap-proach for enhancing speech degraded by colored noise,” IEEE Trans. Speech Audio Process., vol. 8, pp. 159–167, 2000.
[26] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust. Speech Signal Process., vol. 32, pp. 1109–1121, 1984.
[27] S. Suhadi, C. Last and T. Fingscheidt, “A data-driven ap-proach to a priori SNR estimation,” IEEE Trans. Audio Speech Lang. Process., vol. 19, pp. 186–195, 2011.
[28] U. Kjems and J. Jensen, “Maximum likelihood based noise covariance matrix estimation for multi-microphone speech enhancement,” in Proc. European Signal Processing Conf. (EUSIPCO), 2012, pp. 1–5.
[29] Y. Tsao and Y. Lai, “Generalized maximum a posteriori spectral amplitude estimation for speech enhancement,” Speech Communication, vol. 76, pp. 112–126, 2016.
[30] B. Li, Y. Tsao, and K. C. Sim, “An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition,” in Proc. INTERSPEECH, 2013, pp. 3002-3006.
[31] T, Ko, V. Peddinti, D. Povey, M. Seltzer, and S. Khudanpur, “A study on data augmentation of reverberant speech for robust speech recognition,” ICASSP 2017 (submitted), 2017.
[32] N. Parihar and J. Picone, “Aurora working group: DSR front end LVCSR evaluation AU/384/02,” Tech. Rep., Inst. for Signal and Information Process, Mississippi State University.
[33] H.M. Wang, B. Chen, J.W. Kuo, S. S. Cheng, 2005. MATBN: A Mandarin Chinese broadcast news corpus. International Journal of Computational Linguistics and Chinese Language Processing, 10(2), 219-23
[34] D. Yu and L. Deng, Automatic Speech Recognition in Springer Handbook of Signals and Communication Technology, Springer (Chapter 1), 2015.
[35] J. Li and L. Deng, Robust Automatic Speech Recognition in Springer Handbook of a Bridge of Practical Applicants, Springer (Chapter 2), 2016.
[36] M. N. Stuttle, "A Gaussian mixture model spectral representation for speech recognition," Ph.D dissertation, University of Cambridge, 2003.
[37] M. J. F. Gales, S.J. Young, “The application of hidden Markov models in speech recognition,” Foundations and Trends in Signal Processing 1 (3), 195–304, 2007.
[38] J. Baker, L. Deng, J. Glass, S. Khudanpur, C. H. Lee, N. Morgan, et al., 2009a. Research developments and directions in speech recognition and understanding, Part I. IEEE Signal Process. Mag. 26 (3), 75-80.
[39] J. Baker, L. Deng, J. Glass, S. Khudanpur, C. H. Lee, N. Morgan, et al., 2009b. Updated MINDS report on speech recognition and understanding. IEEE Signal Process. Mag. 26 (4), 78-85.
[40] D. Yu and L. Deng, Automatic Speech Recognition in Springer Handbook of Signals and Communication Technology, Springer (Chapter 4), 2015.
[41] D. Yu and L. Deng, Automatic Speech Recognition in Springer Handbook of Signals and Communication Technology, Springer (Chapter 6), 2015.
[42] L. Deng, J. Li, J.T. Huang, K. Yao, D.Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, Recent advances in deep learning for speech research at microsoft. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, Canada (2013)
[43] J. Chen, J. Benesty, Y. Huang, E. J. Diethorn, Fundamentals of Noise Reduction in Springer Handbook of Speech Processing, Springer (Chapter 43), 2008.
[44] I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process, vol.11, no.5, pp. 466-475, 2003.
[45] W.Y. Ma, C.R. Huang, 2016. Uniform and effective tagging of a heterogeneous giga-word corpus. In: Proc. LREC2006, 24-28 |