D. Wang and G. J. Brown, “Computational Auditory Scene Analysis: Prin-ciples, Algorithms, and Applications”. Wiley-IEEE Press, 2006.
 A. S. Bregman, “Auditory Scene Analysis,” MIT Press, Cambridge, MA, 1990.
 M. Slaney, “The History and Future of CASA,” Speech separation by hu-mans and machines, pp.199-211, Springer US, 2005.
 N. Sawhney, “Situational Awareness from Environmental Sounds,” Tech-nical Report, Massachusetts Institute of Technology, 1997.
 D. Barchiesi, D. Giannoulis, D. Stowell, M. D. Plumbley, “Acoustic Scene Classification,” in IEEE Signal Processing Magazine, vol. 32, no. 3, pp.16-34, May 2015.
 S. McAdams, “Recognition of sound sources and events,” Thinking in Sound: The Cognitive Psychology of Human Audition, pp. 146-198, 1993.
 H. E. Zadeh, B. Lehner, M. Dorfer and G. Widmer, “CP-JKU Submissions for DCASE-2016: A Hybrid Approach Using Binaural I-Vectors and Deep Convolutional Neural Networks,” IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2016), Budapest, Hungary, Sep. 2016.
 M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen, “DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks,” IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2016), Budapest, Hungary, Sep. 2016.
 D. Giannoulis, E. Benetos, D. Stowell, and M. D. Plumbley, IEEE AASP CASA Challenge - Public Dataset for Scene Classification Task, https://archive.org/details/dcase2013_scene_classification, retrieved Jun. 29, 2017.
 D. Giannoulis, E. Benetos, D. Stowell, and M. D. Plumbley, IEEE AASP CASA Challenge - Private Dataset for Scene Classification Task, https://archive.org/details/dcase2013_scene_classification_testset, retrieved Jun. 29, 2017.
 M. Annamaria, H. Toni, and V. Tuomas, TUT Acoustic scenes 2016, De-velopment dataset, http://doi.org/10.5281/zenodo.45739, retrieved Dec. 1, 2016.
 M. Annamaria, H. Toni, and V. Tuomas, TUT Acoustic scenes 2016, Eval-uation dataset, https://zenodo.org/record/165995#.WXblsYiGNhE, re-trieved Dec. 1, 2016.
 ETSI Standard Doc., “Speech Processing, Transmission and Quality As-pects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ES 201 108, v1.1.3, Sep. 2003.
 ETSI Standard Doc., “Speech Processing, Transmission and Quality As-pects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ES 202 050, v1.1.5, Jan. 2007.
 Librosa: an open source Python package for music and audio analysis, https://github.com/librosa, retrieved Dec. 1, 2016.
 B. McFee, C. Raffe, D. Liang, D. P. W. Ellis, M. McVicar, E.Battenberg, and O. Nieto, “librosa: Audio and Music Signal Analysis in Python,” in Pro-ceedings of the 14th Python in Conference, Jul. 2015.
 K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
 C. Szegedy, et al. “Going Deeper with Convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, Jun. 2015.
 K. Alex, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, pp.1097-1105, 2012.
 W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
 D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
 N. Rochester, J. Holland, L. Haibt, W. Duda, “Tests on A Cell Assembly Theory of the Action of the Brain, Using A Large Digital Computer”
 F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Cornell Aeronautical Laboratory, Psychological Review, v. 65, no. 6, pp. 386–408.
 F. Rosenblatt, “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms,” Spartan Books, Washington DC, 1961.
 M. Minsky and S. Paper, “Perceptrons,” Cambridge, MA: MIT Press.
 P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representa-tions by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986.
 V. Nair, and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), Jun. 2010.
 S. Sigtia, and S. Dixon, "Improved Music Feature Learning with Deep Neural Networks," in 2014 IEEE International Conference on Acoustics, speech and signal processing (ICASSP), pp. 6959-6963, May 2014.
 N. Srivastava, G. E. Hinton, A. Krizhevsky, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," in Journal of Machine Learn-ing Research, vol. 15, pp. 1929-1958. Jun. 2014.
 Q. Kong, I. Sobieraj, W. Wang and M. Plumbley, “Deep Neural Network Baseline for DCASE Challenge 2016,” in 2016 Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE2016), pp. 50-54, Sep. 2016.
 Z. Liao, G. Carneiro. "Competitive Multi-Scale Convolution," arXiv pre-print arXiv:1511.05635, 2015.
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
 I. Mrazova, and M. Kukacka, “Hybrid convolutional neural networks,” in 6th IEEE International Conference on Industrial Informatics (INDIN), 2008.
 M. Lin, Q. Chen, and S. Yan, “Network in Network,” in Computing Re-search Repository (CoRR), 2013.
 S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, pp. 448-456, 2015.
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
 T. Salimans and D. P. Kingma, “Weight Normalization: A Simple Repa-rameterization to Accelerate Training of Deep Neural Networks,” in Ad-vances in Neural Information Processing Systems, pp. 901-909, 2016.
 TensorFlow: an open source Python package for machine intelligence, https://www.tensorflow.org, retrieved Dec. 1, 2016.
 J. Dean, et al. “Large-Scale Deep Learning for Building Intelligent Com-puter Systems,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 1-1, Feb. 2016.
 M., Annamaria, T. Heittola, and T. Virtanen, “TUT Database for Acoustic Scene Classification and Sound Event Detection,” IEEE 2016 24th Euro-pean Signal Processing Conference, pp. 1128-1132, Aug. 2016.
 DCASE2017 Challenge Baseline website, http://doi.org/10.5281/zenodo.400515, retrieved Mar. 17, 2017.
 DCASE2016 Challenge website, http://www.cs.tut.fi/sgn/arg/dcase2016/task-results-acoustic-scene-classification, retrieved Jun. 26, 2017.
 A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A Generative Model for Raw Audio,” arXiv preprint arXiv:1609.03499, 2016.