參考文獻 |
[1] 呂易宸, “語音門禁系統,” 中央大學電機工程學系碩士論文, 民國100年.
[2] H. Sakoe and S. Chiba,“Dynamic programming algorithm optimization for spoken word recognition,”Acoustics, Speech and Signal Processing, IEEE Transactions on,vol.26,pp.43-49,1978.
[3] C. S. Myers and L. R. Rabiner, “A level building dynamic time warping algorithm for connected word recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. ASSP-29, pp. 284-297, Apr. 1981.
[4] Cory Myers, Lawrence R. Rabiner, Aaron E. Rosenberg, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, Vol. Assp-28, No. 6, December 1980.
[5] Mark Gales and Steve Young ,“The Application of Hidden Markov Models in Speech Recognition,” Foundations and Trends in Signal Processing Vol. 1, No. 3 pp. 195–304, 2007.
[6] L. Rabiner. A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, In: Proceedings of IEEE Volume 77 No. 2 pp 257-286, February 1989.
[7] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” ELSEVIER, Digital Signal Processing, vol. 10, pp. 19-41, 2000.
[8] S. Fine, J. Navratil and R. A. Gopinath, “A hybrid GMM/SVM approach to speaker identification,” IEEE Transactions, Acoustics Speech and Signal Processing, vol. 1, pp. 417-420, 2001.
[9] E. Rodriguez, B. Ruiz, A. G. Crespo, F. Garcia. “Speech/Speaker Recognition Using a HMM/GMM Hybrid Model. “ In: Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication, pp. 227- 234, April 2003
[10] E. Trentin and M. Gori,“A survey of hybrid ANN/HMM models for automatic speech recognition,” Neurocomputing,vol.37,no.1,pp.91-126,2001.
[11] Mohamad Adnan Al-Alaoui, Lina Al-Kanj, Jimmy Azar, and Elias Yaacoub,“Speech Recognition using Artificial Neural Networks and Hidden Markov Models,”IEEE Multidisciplinary Engineering Education Magazine, Vol.3, pp.77-86, September 2008.
[12] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
[13] A.R. Mohamed, G.E. Dahl, and G. Hinton , “Acoustic Modeling Using Deep Belief Networks,” IEEE Transactions on Audio, Speech, and Language Processing,vol.20,no.1,pp.14-22,2012.
[14] T.N. Sainath, A. Mohamed, B. Kingsbury, B. Ramabhadran, "Deep convolutional neural networks for LVCSR", Proc IEEE ICASSP, 2013.
[15] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu, "Convolutional neural networks for speech recognition", IEEE Transactions on Audio Speech and Language Processing, vol. 22, no. 1, pp. 1533-1545, 2014.
[16] Jui-Ting Huang, J. Li, Y. Gong, "An analysis of convolutional neural networks for speech recognition", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2015.
[17] J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah. Signature verification using a siamese time delay neural network. J. Cowan and G. Tesauro (eds) Advances in Neural Information Processing Systems, 1993.
[18] S. Chopra, R. Hadsell and Y. LeCun, “ Learning a similarity metric discriminatively, with application to face verification,”In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol.1, pp. 539–546, 2005.
[19] G. Koch, R. Zemel and R. Salakhutdinov, “ Siamese neural networks for one-shot image recognition,” ICML Deep Learning Workshop. vol. 2 (2015).
[20] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “ Fully-convolutional siamese networks for object tracking,” In European Conference on Computer Vision Workshop, pp. 850–865. Springer, 2016.
[21] 王小川,“語音訊號處理”,全華,民國93年。
[22] .D.A. Reynolds. “Speaker Identification and Verification using Gaussian Mixture Speaker Models”. Speech Communication, V. 17, pp. 177-192, 1995.
[23] S.Furui, “An Overview of Speaker Recognition Technology,” Workshop on Automatic Speaker Recognition, Identification, pp. 1–9, 1994.
[24] D.Burton, “Text Dependent Speaker Verification Using Vector Quantization Source Coding,” Transactions on Acoustics, Speech and Signal Processing, vol.35, pp. 133-143, 1987.
[25] A.Roland and C.Michael and L.T.Harvey, “Score Normalization for Text Independent Speaker Verification Systems,” ScienceDirect Digital Signal Processing, vol.10, pp. 42-54, 2000.
[26] 郭又禎, “改良式梅爾倒頻譜參數應用於關鍵字萃取,”中央大學電機工程學系碩士論文, 民國103年.
[27] J. R. Deller, J. G. Proakis and J. H. L. Hansen, “Discrete-time Processing of Speech Signals,” Wiley-IEEE Press, 1999.
[28] R. Vergin, D. OShaughnessy and A. Farhat, “Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Transactions On Speech And Audio Processing, Vol. 7, NO. 5, 1999.
[29] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Acoustical Society of America Journal, vol. 87, pp.1738–1752, 1990.
[30] S. Ravuri and A. Stolcke, “Recurrent neural network and LSTM models for lexical utterance classification,” in Interspeech, 2015.
[31] K. Yao, G. Zweig, M.-Y. Hwang, Y. Shi, and D. Yu, “Recurrent neural networks for language understanding,” in In Prooceedings of the Interspeech, Lyon, France, August 2013.
[32] R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification,” (IJARAI) International Journal of Advanced Research in Artificial Intelligence, vol. 2, no. 2,2013.
[33] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” J. Artif. Intell. Res. 4, pp. 237-285, 1996.
[34] O. Chapelle, B. Scholkopf, and A. Zien, “Semi-Supervised Learning,” MIT Press, 2007.
[35] A. Subramanya and J. Bilmes, “Semi-Supervised Learning with Measure Propagation,” Journal of Machine Learning Research, 2011.
[36] J. Wu, “Introduction to Convolutional Neural Networks,” 2017.
[37] 斎藤康毅,“Deep Learning:用Python進行深度學習的基礎理論實作,”吳嘉芳譯,碁峰資訊,2017.
[38] S. Wager, S. Wang, and P. Liang, “Dropout training as adaptive regularization,” In Advances in Neural Information Processing Systems 26, pp. 351–359, 2013.
[39] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, pp. 1929-1958,2014.
[40] N. Srivastava, “Improving Neural Networks with Dropout,” Master’s thesis, University of Toronto, January 2013.
[41] Leslie N. Smith, “Cyclical Learning Rates for Training Neural Networks,” U.S. Naval Research Laboratory,2015.
[42] Bo Yang Hsueh, Wei Li and I-Chen Wu, “Stochastic Gradient Descent with Hyperbolic-Tangent Decay,” Computer Vision and Pattern Recognition,2015.
[43] M.D Zeiler, “Adadelta: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012.
[44] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). Society for Artificial Intelligence and Statistics, 2010.
[45] K. He, X. Zhang, S. Ren and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” IEEE International Conference On Computer Vision,2015.
[46] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167 [cs.LG],2015.
[47] J. Bjorck, C. Gomes, B. Selman and KQ. Weinberger, “Understanding Batch Normalization,” arXiv:1806.02375 [cs.LG],2018.
[48] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris ´ Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol ´ Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015.
[49] K. Wongsuphasawat, D.Smilkov, J. Wexler, J. Wilson, D. Mane, D. Fritz, D. Krishnan, F.B. Viegas and M. Wattenberg , “Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow,” IEEE Transactions on Visualization and Computer Graphics, vol. 24,no .1, pp.1-12,2017.
[50] A. Nagrani, J. S. Chung, and A. Zisserman, “Voxceleb: a largescale speaker identification dataset,” in INTERSPEECH, 2017.
[51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[52] A. Mohamed, G. Hinton, and G. Penn, “Understanding how deep belief networks perform acoustic modelling,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 4273–4276, 2012.
|