參考文獻 |
[1] G.Hinton et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Pro-cess. Mag., vol. 29, no. 6, pp. 82–97, 2012.
[2] P. Welch, “The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodo-grams,” IEEE Transactions on audio and electroacoustics, vol. 15, no. 2, pp. 70-73, 1967.
[3] S. S. Stevens, and J. Volkmann, “The relation of pitch to frequency: A re-vised scale,” The American Journal of Psychology, vol. 53, no. 3, pp. 329-353, 1940.
[4] B. Logan, “Mel Frequency Cepstral Coefficients for Music Modeling,” in ISMIR, vol. 270, pp. 1-11, Oct. 2000.
[5] X. Zhou, X.Zhuang, M. Liu, H. Tang, M. Hasegawa-Johnson, T. Huang, “HMM-based acoustic event detection with AdaBoost feature selection,” in Multimodal Technologies for Perception of Humans: International Evalua-tion Workshops CLEAR 2007 and RT 2007. Springer, Berlin, Germany; 2008:345-353.
[6] G.J.Zapata-Zapata et al., “On-line signature verification using Gaussian Mixture Models and small-sample learning strategies,” Revista Facultad de Ingeniería Universidad de Antioquia, vol. 79, pp. 86-97, 2016.
[7] G. Xuan, W.Zhang and P. Chai, “EM algorithms of Gaussian Mixture Model and Hidden Markov Model,” Proc. 2001 Int. Conference on Image Processing (ICIP), vol.1, pp. 145-148, 2001.
[8] Dong Yu and Li Deng. Automatic speech recognition. Springer, pp. 23, 2016
[9] Frederick Jelinek, “Up from trigrams!-the struggle for improver language models,” in Second European Conference on Speech Communication and Technology, pp. 24, 1991.
[10] Reinhard Kneser and Hermann Ney, “Improved backing-off for m-gram language modeling,” in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 181-184, 1995.
[11] Mehryar Mohri, Fernando Pereira, and Michael P Riley, “Weighted fi-nite-state transducers in speech recognition,” in Computer Speech & Lan-guage, pp. 69-88, 2002.
[12] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp. 115-133, Dec. 1943.
[13] F. A. Makinde, C. T. Ako, O. D. Orodu, I. U. Asuquo, "Prediction of crude oil viscosity using feed-forward back-propagation neural network (FFBPNN)," Petroleum and Coal , vol. 54, pp. 120-131, 2012.
[14] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Cornell Aeronautical Laboratory, Psychological Review, v. 65, no. 6, pp. 386–408, 1958.
[15] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Bio-logical cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
[16] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[17] S. M. Witt, S. J. Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” in Speech communication, 30 (2), pp. 95-108, 2000.
[18] F. Zhang et al., “Automatic mispronunciation detection for Mandarin,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5077-5080, 2008.
[19] L. Y. Chen, and J. S. R. Jang, “Automatic pronunciation scoring with score combination by learning to rank and class-normalized DP-based quantiz-tion,” in IEEE Transactions on Audio, Speech, and Language Processing, 23 (11), pp. 1737-1749, 2015.
[20] V. Peddinti, D. Povey, and S. Khudanpur, “A time delay neural network architecture for efficient modeling of long temporal contexts,” in Proceed-ings of Interspeech. ISCA, 2015.
[21] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel et al., “The kaldi speech recognition toolkit,” in IEEE 2011 workshop on auto-matic speech recognition and understanding, no. EPFLCONF-192584. IEEE Signal Processing Society, 2011.
[22] V. Peddinti, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206-5210, 2015.
[23] Yao-Chi Hsu, Berlin Chen et al., “Evaluation Metric-related Optimization Methods for Mandarin Mispronunciation Detection,” in Computational Linguistics and Chinese Language Processing, Vol. 21, No. 2, pp. 55-70, 2016.
[24] Wenping Hu, Qian Yao and Soong Frank K., “Improved Mispronunciation Detection with Deep Neural Network Trained Acoustic Models and Transfer Learning based Logistic Regression classifiers," in Speech Communication 67, pp. 154-166, 2015.
[25] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017.
[26] W.-K. Leung, X. Liu, and H. Meng, “CNN-RNN-CTC Based End-to-End Mispronunciation Detection and Diagnosis,” in IEEE International Con-ference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8132–8136, 2019.
[27] Alex Graves, Santiago Fernandez, Faustino Gomez, and Jurgen Schmid-huber, “Connectionist temporal classification: labelling unsegmented se-quence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning. ACM, pp. 369–376, 2006.
[28] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1,” NASA STI/Recon technical report n, vol. 93, p. 27403, 1993.
[29] G. Zhao, S. Sonsaat, A. O. Silpachai, I. Lucic, E.ChukharevHudilainen, J. Levis, and R. Gutierrez-Osuna, “L2-arctic: A nonnative english speech corpus,” Perception Sensing Instrumentation Lab, 2018. |