參考文獻 |
[1] Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton, “Speech recognition with deep recurrent neural networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013, pp. 6645–6649.
[2] Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS) 2012, pp. 1097–1105.
[3] Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun, “Very Deep Convolutional Networks for Text Classification,” in arXiv:1606.01781, 2016.
[4] Ian Goodfellow, Yoshua Bengio, Aaron Courville, “Deep learing book”, 2015.
[5] S. Chu, S. Narayanan, and C.-C. Kuo, “Environmental sound recognition with time-frequency audio features,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142–1158, Aug. 2009
[6] C. Mydlarz, J. Salamon, and J. P. Bello, “The implementation of low costurban acoustic monitoring devices,” Applied Acoustics, vol. In Press, 2016.
[7] Foggia, P.; Petkov, N.; Saggese, A.; Strisciuglio, N.; Vento, “Reliable detection of audio events in highly noisy environments,” Pattern Recognit. Lett. 2015, 65, 22–28.
[8] Guyot, P.; Pinquier, J.; Valero, X.; Alias, “Two-step detection of water sound events for the diagnostic and monitoring of dementia,” In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), San Jose, CA, USA, 15–19 July 2013; pp. 1–6.
[9] Stowell, D.; Clayton, “Acoustic Event Detection for Multiple Overlapping Similar Source,” s. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 18–21 October 2015.
[10] Clavel, C.; Ehrette, T.; Richard, “Events Detection for an Audio-Based Surveillance System,” In Proceedings of the IEEE International Conference on Multimedia and Expo, Amsterdam, the Netherlands, 6 July 2005;
[11] J. Nam, Z. Hyung, and K. Lee, “Acoustic scene classification using sparse feature learning and selective max-pooling by event detection,” IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events, 2013.
[12] T. Heittola, A. Mesaros, A. J. Eronen, and T. Virtanen, “Context-dependent sound event detection”, EURASIP Journal on Audio, Speech, and Music Processing, 1:1–13, 2013.
[13] J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, “The PASCAL CHiME speech separation and recognition challenge,” Comput. Speech Language, vol. 27, no. 3, pp. 621–633, 2012.
[14] Yoonchang Han and Kyogu Lee. “Acoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing on 08-July-2016.
[15] Daniele Barchiesi; Dimitrios Giannoulis; Dan Stowell; Mark D. Plumbley, “Acoustic Scene Classification: Classifying environments from the sounds they produce,” IEEE Signal Processing Magazine, vol. 32, pp. 16-34, 2015.
[16] Annamaria Mesaros; Toni Heittola; Tuomas Virtane, “TUT database for acoustic scene classification and sound event detection,” 2016 24th European Signal Processing Conference (EUSIPCO), 2016.
[17] Victor Bisot; Romain Serizel; Slim Essid; Gaël Richard. “Acoustic scene classification with matrix factorization for unsupervised feature learning,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
[18] Dan Stowell; Dimitrios Giannoulis; Emmanouil Benetos; Mathieu Lagrange; Mark D. Plumbley, “Detection and Classification of Acoustic Scenes and Events,” IEEE Transactions on Multimedia, 2015.
[19] Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins. “Audio Scene Classification with Deep Recurrent Neural Networks,” 2017.
[20] A. Rakotomamonjy and G. Gasso, “Histogram of gradients of time-frequency representations for audio scene classification,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 142–153, 2015.
[21] T. H. Vu and J.-C. Wang, “Acoustic scene and event recognition using recurrent neural networks,” Detection and Classification of Acoustic Scenes and Events 2016, Tech. Rep., 2016.
[22] A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, “Acoustic event detection in real-life recordings,” in Proc. 18th Eur. Signal Process. Conf., Aalborg, Denmark, Aug. 2010, pp. 1267–1271.
[23] Toni Heittola, Annamaria Mesaros, Tuomas Virtanen, Moncef Gabbouj, “Supervised model training for overlapping sound events based on unsupervised source separation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013, pp. 8677–8681.
[24] T. Heittola, A. Mesaros, A. J. Eronen, and T. Virtanen, “Context-dependent sound event detection,” EURASIP J. Audio, Speech, Music Process., vol. 1, pp. 1–13, 2013.
[25] Satoshi Innami, Hiroyuki Kasai, “NMF-based environmental sound source separation using time-variant gain features,” in Computers & Mathematics with Applications, vol. 64, no. 5, pp. 1333–1342, 2012.
[26] Annamaria Mesaros, Toni Heittola, Onur Dikmen, Tuomas Virtanen, “Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015, pp. 606–618.
[27] Emre Cakir, Toni Heittola, Heikki Huttunen, Tuomas Virtanen, “Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks,” in IEEE International Joint Conference on Neural Networks (IJCNN) 2015.
[28] E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, “Multi-label vs. combined single-label sound event detection with deep neural networks,” in Proc. 23rd Eur. Signal Process. Conf., Nice, France, Aug. 2015, pp. 2551–2555.
[29] G. Parascandolo, H. Huttunen, and T. Virtanen, “Recurrent neural networks for polyphonic sound event detection in real life recordings,” in 2016 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 6440–6444.
[30] E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, “Convolutional recurrent neural networks for polyphonic sound event detection,” in IEEE/ACM TASLP Special Issue on Sound Scene and Event Analysis, 2017.
[31] Y. Bengio. Learning deep architectures for AI, in Foundations and Trends in Machine Learning, 2(1):1–127, 2009.
[32] G. Hinton, S. Osindero, and Y. The, “A fast learning algorithm for deep belief nets,” Neural Computation, 18:1527–1554, 2006.
[33] Simon Haykin, “Neural networks and learning machines,”. vol. 3, 2009, Pearson Education Upper Saddle River.
[34] David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, “Learning internal representations by error propagation,” tech. rep., DTIC Document, 1985.
[35] Paul J Werbos, “Generalization of backpropagation with application to a
recurrent gas market model,” in Neural Networks, vol. 1, no. 4, pp. 339–356,
1988.
[36] Yann LeCun, Yoshua Bengio, “Convolutional networks for images, speech, and time series,” in The handbook of brain theory and neural networks, vol. 3361, no. 10, 1995.
[37] Boris Teodorovich Polyak, “Some methods of speeding up the convergence of iteration methods,” in USSR Computational Mathematics and Mathematical Physics, vol. 4, no. 5, pp. 1–17, 1964.
[38] John Duchi, Elad Hazan, Yoram Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” in The Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
[39] Matthew D Zeiler, “ADADELTA: An adaptive learning rate method,” in arXiv:1212.5701,2012.
[40] Yurii Nesterov, “A method for unconstrained convex minimization problem with the rate of convergence O(1/k2),” in Doklady an SSSR, vol. 269, no.3,pp.543–547,1983.
[41] Tijmen Tieleman, Geoffrey Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” in COURSERA: Neural Networks for Machine Learning, vol. 4, 2012.
[42] Diederik Kingma, Jimmy Ba, “Adam: A method for stochastic optimization,” in arXiv:1412.6980, 2014.
[43] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in arXiv:1502.01852, 2015.
[44] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE 86(11): 2278–2324, 1998.
[45] Zeiler, M. D. and Fergus, “Visualizing and understanding convolutional networks,” Published in Proc. ECCV, 2014.
[46] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, “Going Deeper with Convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[47] K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in arXiv technical report, 2014.
[48] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770 – 778, 2016.
[49] Hinton, G. E. and Sejnowski, T. J., “Learning and relearning in Boltzmann machines,” in Parallel Distributed Processing, vol. 1, pp. 282–317. MIT Press, Cambridge, 1986.
[50] Michael I Jordan, “Attractor dynamics and parallelism in a connectionist
sequential machine,” 1986.
[51] Jeffrey L Elman, “Finding structure in time,” in Cognitive Science, vol. 14,no.2,pp.179–211,1990.
[52] Alex Waibel, “Modular construction of time-delay neural networks for speech recognition,” in Neural Computation, vol. 1, no. 1, pp. 39–46, 1989.
[53] Sepp Hochreiter, Jürgen Schmidhuber, “Long short-term memory,” in Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[54] Kyunghyun Cho, Bart Merriënboer, Dzmitry Bahdanau, Yoshua Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” in arXiv:1409.1259, 2014.
[55] Bengio, Y., Simard, P., and Frasconi, P., “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, 5(2), 157–166, 1994.
[56] Mikolov, T., “Statistical Language Models based on Neural Networks,” Ph.D. thesis, Brno University of Technology, 2012.
[57] Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, “On the difficulty of training Recurrent Neural Networks,” in arXiv:1211.5063, 2013.
[58] Quoc V Le, Navdeep Jaitly, Geoffrey E Hinton, “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units,” in arXiv preprint arXiv:1504.00941, 2015.
[59] Nicolas Boulanger-Lewandowski, Yoshua Bengio, Pascal Vincent, “Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription,” in arXiv preprint arXiv: 1206.6392, 2012.
[60] Luca Pasa, Alessandro Sperduti, “Pre-training of Recurrent Neural Networks via Linear Autoencoders,” in Advances in Neural Information Processing Systems (NIPS), pp. 3572–3580, 2014.
[61] Mike Schuster, Kuldip K Paliwal, “Bidirectional recurrent neural networks,” in IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[62] Alex Graves, Jürgen Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” in Neural Networks, vol. 18, no. 5, pp. 602–610, 2005.
[63] Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, Jürgen Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868,
[64] Alex Graves, “Generating sequences with recurrent neural networks,” in arXiv:1308.0850,2013.
[65] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, “Show and tell: A neural image caption generator,” in arXiv:1411.4555, 2014.
[66] Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals, “Recurrent neural network regularization,” in arXiv:1409.2329, 2014.
[67] Ilya Sutskever, Oriol Vinyals, Quoc VV Le, “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems (NIPS), pp. 3104–3112, 2014.
[68] D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events: An ieee aasp challenge,” in Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop on. IEEE, 2013, pp. 1–4.
[69] B. E. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech communication, vol. 25, no. 1, pp. 117–132, 1998.
[70] C. Nadeu, D. Macho, and J. Hernando, “Time and frequency filtering of filter-bank energies for robust hmm speech recognition,” Speech Communication, vol. 34, no. 1, pp. 93–114, 2001.
[71] S. Molau, M. Pitz, R. Schluter, and H. Ney, “Computing mel-frequency cepstral coefficients on the power spectrum,” in Acoustics, Speech, and Signal Processing, vol. 1, pp. 73–76, 2001.
[72] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
[73] M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen, “DCASE 2016 acoustic scene classification using convolutional neural networks,” in Proc. Workshop Detection Classif. Acoust. Scenes Events, Sep. 2016, pp. 95–99.
[74] Y. Han and K. Lee, “Convolutional neural network with multiple-width frequency-delta data augmentation for acoustic scene classification,” DCASE 2016 Challenge, Tech. Rep., Sep. 2016.
[75] H. Eghbal-Zadeh, B. Lehner, M. Dorfer, and G. Widmer, “CP-JKU submissions for DCASE-2016: A hybrid approach using binaural i-vectors and deep convolutional neural networks,” DCASE 2016 Challenge, Tech. Rep., Sep. 2016.
[76] S. Adavanne, G. Parascandolo, P. Pertila, T. Heittola, and T. Virtanen, ¨ “Sound event detection in multichannel audio using spatial and harmonic features,” IEEE Detection and Classification of Acoustic Scenes and Events workshop, 2016.
[77] Huy Phan Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur and Alfred Mertins, “Audio scene classification with Deep recurrent neural networks,” in arXiv:1703.04770v2, 2017.
[78] Yanmin Qian, Philip C Woodland, “Very deep convolutional neural networks for robust speech recognition” in arXiv:1610.00277v1, 2016.
[79] S.Ioffe and C.Szegedy, “Batch normalization: Accelerating deep network training be reducing internal covariate shift,” CoRR, vol.abs/1502.03167, 2015.
[80] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R., “Dropout: a simple way to prevent neural networks from overfitting,” Machine Learning Res. 15, 1929–1958 (2014).
[81] http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-acoustic-scene-classification.
[82] http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-rare-sound-event-detection.
[83] http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-sound-event-detection-in-real-life-audio. |