參考文獻 |
[1] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimen- sionality of data with neural networks. Science, 313(5786):504 – 507, 2006.
[2] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 524–531 vol. 2, June 2005.
[3] A. Bosch, A. Zisserman, and X. Muñoz. Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4):712–727, April 2008.
[4] DeLiang Wang and Guy J. Brown. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, 2006.
[5] A. de la Torre, A. M. Peinado, J. C. Segura, J. L. Perez-Cordoba, M. C. Benitez, and A. J. Rubio. Histogram equalization of speech representation for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 13(3):355–366, May 2005.
[6] K. S. R. Murty and B. Yegnanarayana. Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters, 13(1):52–55, Jan 2006.
[7] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. Front-end factor analysis for speaker verification. IEEE Transac- tions on Audio, Speech, and Language Processing, 19(4):788–798, May 2011.
[8] O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52(7):1830–1847, July 2004.
[9] Z. Fu, G. Lu, K. M. Ting, and D. Zhang. A survey of audio-based music classification and annotation. IEEE Transactions on Multi- media, 13(2):303–319, April 2011.
[10] M. Muller, D. P. W. Ellis, A. Klapuri, and G. Richard. Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5(6):1088–1110, Oct 2011.
[11] D. A. Reynolds and R. C. Rose. Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans- actions on Speech and Audio Processing, 3(1):72–83, Jan 1995.
[12] Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Tut database for acoustic scene classification and sound event detection. In Proceedings of 24th European Signal Processing Conference 2016 (EUSIPCO 2016), Budapest, Hungary, 2016.
[13] Jonathan Dennis, Huy Dat Tran, and Haizhou Li. Spectrogram im- age feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18(2):130–133, 2011.
[14] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kings- bury. Deep neural networks for acoustic modeling in speech recog- nition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, Nov 2012.
[15] S. Chu, S. Narayanan, and C. C. J. Kuo. Environmental sound recognition with time-frequency audio features. IEEE Transactions on Audio, Speech, and Language Processing, 17(6):1142–1158, Aug 2009.
[16] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998.
[17] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Procees- dings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1701–1708, June 2014.
[18] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. CoRR, abs/1312.4400, 2013.
[19] S. Davis and P. Mermelstein. Comparison of parametric represen- tations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, Aug 1980.
[20] A. Ramalingam and S. Krishnan. Gaussian mixture modeling of short-time fourier transform features for audio fingerprinting. IEEE Transactions on Information Forensics and Security, 1(4):457–463, Dec 2006.
[21] G. Tzanetakis and P. Cook. Musical genre classification of au- dio signals. IEEE Transactions on Speech and Audio Processing, 10(5):293–302, Jul 2002.
[22] D. Pye. Content-based methods for the management of digital mu- sic. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2000, volume 6, pages 2437–2440 vol.4, 2000.
[23] L. R. Rabiner. A tutorial on hidden markov models and se- lected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, Feb 1989.
[24] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in the statistical analysis of prob- abilistic functions of markov chains. Ann. Math. Statist., 41(1):164– 171, 02 1970.
[25] J. Ajmera and C. Wooters. A robust speaker clustering algorithm. In Automatic Speech Recognition and Understanding, 2003. ASRU ’03. 2003 IEEE Workshop on, pages 411–416, Nov 2003.
[26] A. L. Berenzweig and D. P. W. Ellis. Locating singing voice seg- ments within music signals. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the, pages 119–122, 2001.
[27] Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 144–152, New York, NY, USA, 1992. ACM.
[28] Guodong Guo and S. Z. Li. Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks, 14(1):209–215, Jan 2003.
[29] D. A. Sadlier and N. E. O’Connor. Event detection in field sports video using audio-visual features and a support vector machine. IEEE Transactions on Circuits and Systems for Video Technology, 15(10):1225–1233, Oct 2005.
[30] V. Wan and S. Renals. Speaker verification using sequence discrim- inant support vector machines. IEEE Transactions on Speech and Audio Processing, 13(2):203–210, March 2005.
[31] Tommi S. Jaakkola and David Haussler. Exploiting generative mod- els in discriminative classifiers. In Proceedings of the 1998 Confer- ence on Advances in Neural Information Processing Systems II, pages 487–493, Cambridge, MA, USA, 1999. MIT Press.
[32] Changsheng Xu, N. C. Maddage, and Xi Shao. Automatic music classification and summarization. IEEE Transactions on Speech and Audio Processing, 13(3):441–450, May 2005.
[33] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neu- ral networks from overfitting. Journal of Machine Learning Re- search, 15:1929–1958, 2014.
[34] Ruslan Salakhutdinov and Geoffrey Hinton. Deep Boltzmann ma- chines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448–455, 2009.
[35] G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre- trained deep neural networks for large-vocabulary speech recogni- tion. IEEE Transactions on Audio, Speech, and Language Process- ing, 20(1):30–42, Jan 2012.
[36] Abdel rahman Mohamed, George Dahl, and Geoffrey Hinton. Deep belief networks for phone recognition. In Proceedings of the NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009.
[37] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In 2014 IEEE Conference on Computer Vision and Pattern Recogni- tion Workshops, pages 512–519, June 2014.
[38] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neu- ral networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1725–1732, June 2014.
[39] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Ima- genet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, edi- tors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
[40] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term mem- ory. Neural Computation, 9(8):1735–1780, November 1997.
[41] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, abs/1406.1078, 2014.
[42] Michael E. Tipping and Chris M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61:611–622, 1999.
[43] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5 - rmsprop, coursera: Neural networks for machine learning. 2012.
[44] Vinod Nair and Geoffrey E. Hinton. Rectified linear units im- prove restricted boltzmann machines. In Johannes Fürnkranz and Thorsten Joachims, editors, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807–814. Om- nipress, 2010.
[45] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733–1746, Oct 2015. |