|| J. Chen, A. H. Kam, J. Zhang, N. Liu and L. Shue, "Bathroom activity monitoring based on sound," in International Conference on Pervasive Computing, 2005. |
 F. Weninger and B. Schuller, "Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations," in acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference, 2011.
 C. Clavel, T. Ehrette and G. Richard, "Events detection for an audio-based surveillance system," in Multimedia and Expo, 2005. ICME 2005. IEEE International conference, 2005.
 M. Bugalho, J. Portelo, I. Trancoso, T. Pellegrini and A. Abad, "Detecting audio events for semantic video search," in Tenth Annual Conference of the International Speech Communication Association, 2009.
 A.-r. Mohamed, G. Hinton and G. Penn, "Understanding how deep belief networks perform acoustic modelling," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference, 2012.
 T. N. Sainath, R. J. Weiss, A. Senior, K. W. Wilson and O. Vinyals, "Learning the speech front-end with raw waveform CLDNNs," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
 H. Lee, P. Pham, Y. Largman and A. Y. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Advances in neural information processing systems, 2009.
 A. Van den Oord, S. Dieleman and B. Schrauwen, "Deep content-based music recommendation," in Advances in neural information processing systems, 2013.
 V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi and T. Sorsa, "Computational auditory scene recognition," in Acoustics, speech, and signal processing (icassp), 2002 IEEE international conference, 2002.
 L. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
 Y.-T. Peng, C.-Y. Lin, M.-T. Sun and K.-C. Tsai, "Healthcare audio event classification using hidden markov models and hierarchical hidden markov models," in Multimedia and Expo, 2009. ICME 2009. IEEE International Conference, 2009.
 B. Elizalde, A. Kumar, A. Shah, R. Badlani, E. Vincent, B. Raj and I. Lane, "Experiments on the DCASE challenge 2016: Acoustic scene classification and sound event detection in real life recording," arXiv preprint arXiv:1607.06706, 2016.
 J.-C. Wang, J.-F. Wang, K. W. He and C.-S. Hsu, "Environmental sound classification using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor," in Neural Networks, 2006. IJCNN′06. International Joint Conference, 2006.
 A. Krizhevsky, I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012.
 K. J. Piczak, "Environmental sound classification with convolutional neural networks," in Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop, 2015.
 D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange and M. D. Plumbley, "Detection and classification of acoustic scenes and events," IEEE Transactions on Multimedia, vol. 17, no. 10, pp. 1733-1746, 2015.
 "DCASE 2017 Workshop," [Online]. Available: http://www.cs.tut.fi/sgn/arg/dcase2017/. [Accessed 30 - June - 2017].
 Y. Aytar, C. Vondrick and A. Torralba, "Soundnet: Learning sound representations from unlabeled video," Advances in Neural Information Processing Systems, pp. 892-900, 2016.
 W. Dai, C. Dai, S. Qu, J. Li and S. Das, "Very deep convolutional neural networks for raw waveforms," in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference, 2017.
 M. Lin, Q. Chen and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
 Y. Tokozume and T. Harada, "Learning environmental sounds with end-to-end convolutional neural network," in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference, 2017.
 Y. Tokozume, Y. Ushiku and T. Harada, "Learning from Between-class Examples for Deep Sound Recognition," in ICLR 2018 Conference, 2018.
 F. Rosenblatt, "The perceptron: a probabilistic model for information storage and organization in the brain.," Psychological review, vol. 65, pp. 386-408, 1958.
 D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning representations by back-propagating errors," nature, vol. 323, no. 6088, p. 533, 1986.
 Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
 D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot and others, "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, no. 7587, p. 484, 2016.
 X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010.
 K. He, X. Zhang, S. Ren and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of the IEEE international conference on computer vision, 2015.
 S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
 K. J. Piczak, "ESC: Dataset for environmental sound classification," in Proceedings of the 23rd ACM international conference on Multimedia, 2015.
 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference, 2009.
 T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll′ar and C. L. Zitnick, "Microsoft coco: Common objects in context," in European conference on computer vision, 2014.
 J. Salamon and J. P. Bello, "Deep convolutional neural networks and data augmentation for environmental sound classification," IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279-283, 2017.
 N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
 Y. Nesterov, "Gradient Methods for Minimizing Composite," 2007.
 V. Boddapati, A. Petef, J. Rasmusson and L. Lundberg, "Classifying environmental sounds using image recognition networks," Procedia Computer Science, vol. 112, pp. 2048-2056, 2017.
 K. Simonyan, A. Vedaldi and A. Zisserman, "Deep inside convolutional networks: Visualising image classification models and saliency maps," arXiv preprint arXiv:1312.6034, 2013.
 M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014.
 J. Salamon, C. Jacoby and J. P. Bello, "A dataset and taxonomy for urban sound research," in Proceedings of the 22nd ACM international conference on Multimedia, 2014.