參考文獻 |
[1] Anjali, A. Kumar and N. Birla, Voice Command Recognition System based on MFCC and DTW, International Journal of Engineering Science and Technology, 2(12),2010.
[2] A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, and M. Picheny, “Deep belief networks using discriminative features for phone recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2011, pp. 5060–5063
[3] B.Y. Chen, Q. Zhu, and N. Morgan, “A Neural Network for Learning Long-Term Temporal Features for Speech Recognition,” Proc. ICASSP 2005, March 2005, pp. 945-948
[4] Corneliu Octavian Dumitru, Inge Gavat, “A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language,” International Symphosium ELMAR, 07-09 June, 2006, Zadar, Croatia
[5] C. Poonkuzhali, R. Karthiprakash, S. Valarmathy and M. Kalamani, An Approach to feature selection algorithm based on Ant Colony Optimization for Automatic Speech Recognition, International journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 11(2), and 2013.
[6] C. Ittichaichareon, S. Suksri and T. Yingthawornsuk, speech Recognition using MFCC, International Conference on Computer Graphics Simulation and Modeling, 2012.
[7] C. Kim and R. M. Stern, “Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring”, in Proc. ICASSP, pp. 4574–4577, 2010.
[8] C. Charbuillet, B. Gas, M. Chetouani and J. L. Zarader, "Complementary features for speaker verification based on genetic algorithms," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4 2007 pp. IV-285 - IV-288.
[9] D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, “Feature learning in deep neural networks - studies on speech recognition tasks,” in Proc. Int. Conf. Learn. Represent., 2013.
[10]Diederik P. Kingma and Jimmy Lei Ba “A METHOD FOR
STOCHASTIC OPTIMIZATION” ICLR 2015.
[11] D.C.Cire¸san, U. Meier, J. Masci, L.M. Gambardella, and J. Schmidhuber. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.
[12] D. Cire¸san, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.
[13] E. Bocchieri and D. Dimitriadis “Investigating deep neural network based transforms of robust audio features for LVCSR” in Proc. ICASSP, pp. 6709–6713, 2013.
[14] F. Seide, G. Li, X. Chen, and D. Yu, “Feature engineering in context-dependent deep neural networks for conversational speech transcription,” in Proc. IEEE Workshop Autom. Speech Recognition Understand. (ASRU), 2011, pp. 24–29.
[15] F. Seide, G. Li, and D. Yu, “Conversational speech transcription
using context-dependent deep neural networks,” in Proc. Interspeech, 2011, pp. 437–440.
[16] H. Lee, P. Pham, Y. Largman, and A. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” in Proc. Adv. Neural Inf. Process. Syst. 22, 2009, pp. 1096–1104.
[17] H. Franco, M. Graciarena, and A. Mandal, “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition”, Proc. ICASSP 2012, pp. 4117-4120, March 2012
[18] J. Chen , K. K. Paliwal, M. Mizumachi and S. Nakamura, “Robust mfccs derived from differentiated power spectrum” Eurospeech 2001, Scandinavia, 2001.
[19] J.C.Wang,J.F.Wang,Y.S.Weng, “Chip design of MFCC extraction for speech recognition Volume 32 ,“ Issues 1–2, pp. 111-131, November 2002.
[20] L. Muda, M. Begam and I. Elamvazuthi, Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping(DTW) Techniques, Journal of Computing, 3(2),2010
[21] L. Deng, K. Hassanein, and M. Elmasry, “Analysis of correlation structure for a neural predictive model with applications to speech recognition,” Neural Netw., vol. 7, no. 2, pp. 331–339, 1994.
[22]L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1060–1089, May 2013.
[23]M.A.Anusuya and S.K.Katti, “Speech Recognition by Machine: A Review”, (IJCSIS) International Journal of Computer Science and Information Security, vol. 6, no. 3, pp. 181-205, 2009.
[24]M. Kleinschmidt, “Localized spectro-temporal features for automatic speech recognition,” in Proc. of Eurospeech, 2003, Sep 2003, pp. 2573–2576.
[25]N. Morgan, “Deep and wide: Multiple layers in automatic speech recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 7–13, Jan. 2012.
[26]Ossama Abdel-Hamid, Li Deng and Dong Yu, “Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition, “ Interspeech, pp. 3366-3370, August 2013.
[27]Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition, “IEEE/ACM Transaction On Audio, Speech, and Language Processing, Vol. 22, No. 10, October 2014.
[28]Ovidiu Buza1, Gavril Toderean1, Alina Nica1, Alexandru Caruntu1, “Voice Signal Processing For Speech Synthesis,” IEEE International Conference on Automation, Quality and Testing Robotics, Vol. 2, pp. 360-364, 25-28 May 2006.
[29]Parwinder Pal Singh and Pushpa Rani, “An Approach to Extract Feature using MFCC,” International organization of Scientific Research, Volume .04,pp.21-25, August 2014.
[30]P. C. Woodland and D. Povey, “Large scale discriminative training of
hidden Markov models for speech recognition,” Computer Speech
and Language, vol. 16, no. 1, pp. 25–47, 2002.
[31]Q. Zhu, B. Chen, N. Morgan, and A. Stolcke, “Tandem connectionist feature extraction for conversational speech recognition,” in Machine Learning for Multimodal Interaction. Berlin/Heidelberg, Germany: Springer , 2005, vol. 3361, pp. 223–231.
[32]Rajesh Kumar Aggarwal and M. Dave, “Acoustic modeling problem for automatic speech recognition system: advances and refinements Part (Part II)”, Int J Speech Technol, pp. 309– 320, 2011.
[33]Shuo-Yiin Chang and Nelson Morgan, “Robust CNN-based Speech Recognition With Gabor Filter Kernels, “ Interspeech, pp. 905-909, September 2014.
[34]Sheeraz Memon, Margaret Lech and Ling He, "Using information theoretic vector quantization for inverted mfcc based speaker verification," 2nd International Conference on Computer, Control and Communication, 2009. IC4 2009, pp. 1 – 5.
[35]S. Witt and S. Young, “Phone-level pronunciation scoring and
assessment for interactive language learning,” Speech
Communication, vol. 30, no. 2–3, pp. 95–108, 2000.
[36]S. Dhingra, G. Nijhawan and P. Pandit, Isolated Speech Recognition using MFCC and DTW, International journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2013.
[37] S. Chakroborty and S. Goutam, “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.
[38]S.Y. Chang, N. Morgan “Informative spectro-temporal bottleneck features for noise-robust speech recognition”, Proc. Interspeech 2013
[39]T. Landauer, C. Kamm, and S. Singhal, “Learning a minimally structured back propagation network to recognize speech,” in Proc. 9th Annu. Conf. Cogn. Sci. Soc., 1987, pp. 531–536.
[40]W. Han, C. F. Chan, C. S. Choy and K. P. Pun, “An Efficient MFCC
Extraction Method in Speech Recognition,” International Symposium on Circuits and Systems, pp. 21-24, 2006.
[41]Wang Chen, Miao Zhenjiang and Meng Xiao, "Comparison of different implementations of mfcc," J. Computer Science & Technology, 2001, pp. 16(16): 582-589.
[42]Wang Chen, Miao Zhenjiang and Meng Xiao, "Differential mfcc and vector quantization used for real-time speaker recognition system," Congress on Image and Signal Processing, 2008, pp. 319 - 323.
[43]Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time-series,” in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA, USA: MIT Press, 1995.
|