參考文獻 |
[1] D. Wang and G. J. Brown, “Computational Auditory Scene Analysis: Prin-ciples, Algorithms, and Applications”. Wiley-IEEE Press, 2006.
[2] A. S. Bregman, “Auditory Scene Analysis,” MIT Press, Cambridge, MA, 1990.
[3] M. Slaney, “The History and Future of CASA,” Speech separation by hu-mans and machines, pp.199-211, Springer US, 2005.
[4] N. Sawhney, “Situational Awareness from Environmental Sounds,” Tech-nical Report, Massachusetts Institute of Technology, 1997.
[5] D. Barchiesi, D. Giannoulis, D. Stowell, M. D. Plumbley, “Acoustic Scene Classification,” in IEEE Signal Processing Magazine, vol. 32, no. 3, pp.16-34, May 2015.
[6] S. McAdams, “Recognition of sound sources and events,” Thinking in Sound: The Cognitive Psychology of Human Audition, pp. 146-198, 1993.
[7] H. E. Zadeh, B. Lehner, M. Dorfer and G. Widmer, “CP-JKU Submissions for DCASE-2016: A Hybrid Approach Using Binaural I-Vectors and Deep Convolutional Neural Networks,” IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2016), Budapest, Hungary, Sep. 2016.
[8] M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen, “DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks,” IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2016), Budapest, Hungary, Sep. 2016.
[9] D. Giannoulis, E. Benetos, D. Stowell, and M. D. Plumbley, IEEE AASP CASA Challenge - Public Dataset for Scene Classification Task, https://archive.org/details/dcase2013_scene_classification, retrieved Jun. 29, 2017.
[10] D. Giannoulis, E. Benetos, D. Stowell, and M. D. Plumbley, IEEE AASP CASA Challenge - Private Dataset for Scene Classification Task, https://archive.org/details/dcase2013_scene_classification_testset, retrieved Jun. 29, 2017.
[11] M. Annamaria, H. Toni, and V. Tuomas, TUT Acoustic scenes 2016, De-velopment dataset, http://doi.org/10.5281/zenodo.45739, retrieved Dec. 1, 2016.
[12] M. Annamaria, H. Toni, and V. Tuomas, TUT Acoustic scenes 2016, Eval-uation dataset, https://zenodo.org/record/165995#.WXblsYiGNhE, re-trieved Dec. 1, 2016.
[13] ETSI Standard Doc., “Speech Processing, Transmission and Quality As-pects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ES 201 108, v1.1.3, Sep. 2003.
[14] ETSI Standard Doc., “Speech Processing, Transmission and Quality As-pects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ES 202 050, v1.1.5, Jan. 2007.
[15] Librosa: an open source Python package for music and audio analysis, https://github.com/librosa, retrieved Dec. 1, 2016.
[16] B. McFee, C. Raffe, D. Liang, D. P. W. Ellis, M. McVicar, E.Battenberg, and O. Nieto, “librosa: Audio and Music Signal Analysis in Python,” in Pro-ceedings of the 14th Python in Conference, Jul. 2015.
[17] K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
[18] C. Szegedy, et al. “Going Deeper with Convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, Jun. 2015.
[19] K. Alex, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems, pp.1097-1105, 2012.
[20] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[21] D. O. Hebb, “Organization of Behavior,” New York: Wiley & Sons.
[22] N. Rochester, J. Holland, L. Haibt, W. Duda, “Tests on A Cell Assembly Theory of the Action of the Brain, Using A Large Digital Computer”
[23] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Cornell Aeronautical Laboratory, Psychological Review, v. 65, no. 6, pp. 386–408.
[24] F. Rosenblatt, “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms,” Spartan Books, Washington DC, 1961.
[25] M. Minsky and S. Paper, “Perceptrons,” Cambridge, MA: MIT Press.
[26] P. J. Werbos, “Beyond regression: new tools for prediction and analysis in the behavioral sciences,” Ph.D. thesis, Harvard University, 1974.
[27] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representa-tions by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986.
[28] V. Nair, and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), Jun. 2010.
[29] S. Sigtia, and S. Dixon, "Improved Music Feature Learning with Deep Neural Networks," in 2014 IEEE International Conference on Acoustics, speech and signal processing (ICASSP), pp. 6959-6963, May 2014.
[30] N. Srivastava, G. E. Hinton, A. Krizhevsky, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," in Journal of Machine Learn-ing Research, vol. 15, pp. 1929-1958. Jun. 2014.
[31] Q. Kong, I. Sobieraj, W. Wang and M. Plumbley, “Deep Neural Network Baseline for DCASE Challenge 2016,” in 2016 Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE2016), pp. 50-54, Sep. 2016.
[32] Z. Liao, G. Carneiro. "Competitive Multi-Scale Convolution," arXiv pre-print arXiv:1511.05635, 2015.
[33] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[34] I. Mrazova, and M. Kukacka, “Hybrid convolutional neural networks,” in 6th IEEE International Conference on Industrial Informatics (INDIN), 2008.
[35] M. Lin, Q. Chen, and S. Yan, “Network in Network,” in Computing Re-search Repository (CoRR), 2013.
[36] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, pp. 448-456, 2015.
[37] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
[38] T. Salimans and D. P. Kingma, “Weight Normalization: A Simple Repa-rameterization to Accelerate Training of Deep Neural Networks,” in Ad-vances in Neural Information Processing Systems, pp. 901-909, 2016.
[39] TensorFlow: an open source Python package for machine intelligence, https://www.tensorflow.org, retrieved Dec. 1, 2016.
[40] J. Dean, et al. “Large-Scale Deep Learning for Building Intelligent Com-puter Systems,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 1-1, Feb. 2016.
[41] M., Annamaria, T. Heittola, and T. Virtanen, “TUT Database for Acoustic Scene Classification and Sound Event Detection,” IEEE 2016 24th Euro-pean Signal Processing Conference, pp. 1128-1132, Aug. 2016.
[42] DCASE2017 Challenge Baseline website, http://doi.org/10.5281/zenodo.400515, retrieved Mar. 17, 2017.
[43] DCASE2016 Challenge website, http://www.cs.tut.fi/sgn/arg/dcase2016/task-results-acoustic-scene-classification, retrieved Jun. 26, 2017.
[44] A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A Generative Model for Raw Audio,” arXiv preprint arXiv:1609.03499, 2016. |