參考文獻 |
[1] R. Stiefelhagen, K. Bernardin, R. Bowers, R. T. Rose, M. Michel, and J. Garofolo, “The clear 2007 evaluation,” in Multimodal Technologies for Perception of Humans, R. Stiefelhagen, R. Bowers, and J. Fiscus, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 3–34.
[2] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange and M. D. Plumbley, “Detection and Classification of Acoustic Scenes and Events,” in IEEE Transactions on Multimedia, vol. 17, no. 10, pp. 1733-1746.
[3] Danfeng Li and S. E. Levinson, “A linear phase unwrapping method for binaural sound source localization on a robot,” Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), 2002, pp. 19-23 vol.1.
[4] S. Mischie and G. Gășpăresc, “On Using ReSpeaker Mic Array 2.0 for speech processing algorithms,” 2020 International Symposium on Electronics and Telecommunications (ISETC), 2020, pp. 1-4.
[5] M. Binelli, A. Venturi, A. Amendola, and A. Farina, “Experimental analysis of spatial properties of the sound field inside a car employing a spherical microphone array,” in Audio Eng. Soc. (AES) Conv., Audio Engineering Society, 2011.
[6] C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320-327.
[7] M. S. Brandstein and H. F. Silverman, “A robust method for speech signal time-delay estimation in reverberant rooms,” 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997, pp. 375-378 vol.1.
[8] D. H. Johnson and D. E. Dudgeon, “Array Signal Processing: Concepts and Techniques,” 1993.
[9] J. P. Burg, “Maximum entropy spectral analysis,” in Proceedings of the 37th Annual International Meeting, Oklahoma City, OK, USA, 31 October 1967.
[10] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” in Proceedings of the IEEE, vol. 57, no. 8, pp. 1408-1418, Aug. 1969.
[11] R. Schmidt, “Multiple emitter location and signal parameter estimation,” in IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276-280, March 1986.
[12] K. Youssef, S. Argentieri and J. Zarader, “A learning-based approach to robust binaural sound localization,” 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 2927-2932.
[13] Aminoff, J. Michael, François Boller, and D F. Swaab, “The Human Auditory System: Fundamental Organization and Clinical Disorders,” 2015, Internet resource.
[14] X. Xiao, S. Zhao, X. Zhong, D. L. Jones, E. S. Chng and H. Li, “A learning-based approach to direction of arrival estimation in noisy and reverberant environments,” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 2814-2818.
[15] R. Roden, N. Moritz, S. Gerlach, S. Weinzierl and S. Goetze, “On sound source localization of speech signals using deep neural networks,” Proc. Deutsche Jahrestagung Akustik (DAGA), pp. 1510-1513, 2015.
[16] D. Krause, A. Politis and K. Kowalczyk, “Feature Overview for Joint Modeling of Sound Event Detection and Localization Using a Microphone Array,” 2020 28th European Signal Processing Conference (EUSIPCO), 2021, pp. 31-35.
[17] S. Adavanne, A. Politis and T. Virtanen, “Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network,” Proc. Euro. Signal Process. Conf., 2018.
[18] S. S. Mane, S. G. Mali and S. P. Mahajan, “Localization of Steady Sound Source and Direction Detection of Moving Sound Source using CNN,” 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2019, pp. 1-6.
[19] J. Rouat, “Computational auditory scene analysis: Principles algorithms and applications (Wang D. and Brown GJ eds.; 2006) [book review],” IEEE Trans. Neural Netw., vol. 19, no. 1, Jan. 2008.
[20] G.J. Zapata-Zapata, J.D. Arias-Londoño, J.F. Vargas-Bonilla and J.R. Orozco-Arroyave, “On-line signature verification using Gaussian Mixture Models and small-sample learning strategies,” Revista Facultad de Ingeniería Universidad de Antioquia, vol. 79, pp. 86-97, 2016.
[21] G. Xuan, W. Zhang and P. Chai, “EM algorithms of Gaussian Mixture Model and Hidden Markov Model,” Proc. 2001 Int. Conference on Image Processing (ICIP), vol. 1, pp. 145-148, 2001.
[22] X. Zhou, X. Zhuang, M. Liu, H. Tang, M. Hasegawa-Johnson, T. Huang, “HMM-based acoustic event detection with AdaBoost feature selection,” In Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007. Springer, Berlin, Germany; 2008:345-353.
[23] J. Vavrek, M. Pleva, J. Juhar, “Acoustic events detection with support vector machines,” In Electrical Engineering and Informatics, Proceeding of the Faculty of Electrical Engineering and Informatics of the Technical University of Košice, September, 2010, Kosice, pp. 796-801, ISBN 978-80-553-0460-1, 2010.
[24] J. Schröder, F. X. Nsabimana, J. Rennies, D. Hollosi and S. Goetze, “Automatic detection of relevant acoustic events in kindergarten noisy environments,” Proc. Deutsche Jahrestagung für Akustik, pp. 1525-1528, Mar. 2015.
[25] T. Heittola, A. Mesaros, A. Eronen and T. Virtanen, “Context-dependent sound event detection,” EURASIP J. Audio Speech Music Process., vol. 2013, 2013.
[26] E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen and T. Virtanen, “Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1291-1303, June 2017.
[27] Y. Li, M. Liu, K. Drossos and T. Virtanen, “Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 286-290.
[28] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” Proc. of ICLR, pp. 1-13, 2016.
[29] P. S. Tan, K. M. Lim, C. P. Lee and C. H. Tan, “Acoustic Event Detection with MobileNet and 1D-Convolutional Neural Network,” 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), 2020, pp. 1-6.
[30] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” 2017.
[31] I. Aizenberg, N. Aizenberg, C. Butakov and E. Farberov, “Image recognition on the neural network based on multi-valued neurons,” Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, 2000, pp. 989-992 vol.2.
[32] W. S. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Imminent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.
[33] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, pp. 193-202, 1980.
[34] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324.
[35] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.
[36] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, 9(8):1735–1780, 1997.
[37] J. Chung, C. Gulcehre, K. Cho and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” Dec. 2014, [online] Available: http://arxiv.org/abs/1412.3555.
[38] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size,” arXiv preprint arXiv:1602.07360, 2016.
[39] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” CoRR, abs/1409.0473, 2014.
[40] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., “Attention is all you need,” CoRR, vol. abs/1706.03762, 2017.
[41] Y. Cao, T. Iqbal, Q. Kong, Y. Zhong, W. Wang, and M. D. Plumbley, “Event-Independent Network for Polyphonic Sound Event Localization and Detection,” DCASE 2020 Workshop, November 2020.
[42] A. Politis, S. Adavanne, and T. Virtanen, “A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection,” In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE2020). November 2020.
[43] A. Mesaros, S. Adavanne, A. Politis, T. Heittola, and T. Virtanen, “Joint measurement of localization and detection of sound events,” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, Oct 2019. Accepted.
[44] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778. |