參考文獻 |
[1] T. N. Sainath and C. Parada, “Convolutional Neural Networks for Small-footprint Keyword Spotting,” Proc. INTERSPEECH, pp. 1478–1482, 2015.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2323, 1998.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2012.
[4] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
[5] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017.
[6] Y. Wu, M. Schuster, Z. Chen, Q. V Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, ?. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean, “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” ArXiv e-prints, pp. 1–23, 2016.
[7] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” Proc. 2014 Conf. Empir. Methods Nat. Lang. Process., pp. 1746–1751, 2014.
[8] O. Abdel-hamid, H. Jiang, and G. Penn, “Applying Convolutional Neural Networks Concepts to Hybrid Nn-Hmm Model for Speech Recognition,” ICASSP 2012, pp. 4277–4280, 2012.
[9] L. Toth, “Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2014, pp. 190–194.
[10] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. Legresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu, “Deep Speech 2?: End-to-End Speech Recognition in English and Mandarin arXiv?: 1512 . 02595v1 [ cs . CL ] 8 Dec 2015,” pp. 1–28, 2015.
[11] P. Warden, “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,” 2018.
[12] Wei Han, Cheong-Fat Chan, Chiu-Sing Choy, and Kong-Pang Pun, “An efficient MFCC extraction method in speech recognition,” in 2006 IEEE International Symposium on Circuits and Systems, 2006, p. 4.
[13] S. Molau, M. Pitz, R. Schluter, and H. Ney, “Computing Mel-frequency cepstral coefficients on the power spectrum,” 2001 IEEE Int. Conf. Acoust. Speech, Signal Process. Proc. (Cat. No.01CH37221), vol. 1, pp. 73–76.
[14] J. G. Proakis and D. G. Monolakis, “Digital signal processing: principles, algorithms, and applications,” Pentice Hall, pp. 1–42, 1996.
[15] A. Zolnay, R. Schluter, and H. Ney, “Acoustic feature combination for robust speech recognition,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. I, pp. 457–460, 2005.
[16] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” 2015.
[17] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
[18] R. Tang and J. Lin, “Deep Residual Learning for Small-Footprint Keyword Spotting,” 2017. |