參考文獻 |
[1] A. Harma, M. F. McKinney, and J. Skowronek. Automatic surveillance of the acoustic activity in our living environment. In 2005 IEEE International Conference on Multimedia and Expo (ICME), July 2005.
[2] P. Guyot, J. Pinquier, X. Valero, and F. Alías. Two-step detection of water sound events for the diagnostic and monitoring of dementia. In 2013 IEEE International Conference on Multimedia and Expo (ICME), pages 16, July
2013.
[3] Behnaz Ghoraani and Sridhar Krishnan. Time-frequency matrix feature extraction and classication of environmental audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):21972209, 2011.
[4] Daniel P.W. Ellis and Keansub Lee. Minimal-impact audio-based personal archives. In Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences - CARPE′04, page 39, New York, USA,
2004. ACM Press.
[5] Stavros Ntalampiras, Ilyas Potamitis, and Nikos Fakotakis. On acoustic surveillance of hazardous situations. In Proceedings of 2009 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 165168, Taipei, Taiwan, 2009. IEEE.
[6] C. Clavel, T. Ehrette, and G. Richard. Events detection for an audio-based surveillance system. In Proceedings of 2005 IEEE International Conference on Multimedia and Expo, pages 13061309. IEEE, 2005.
[7] Andrey Temko, Robert Malkin, Christian Zieger, Du²an Macho, Climent Nadeu, and Maurizio Omologo. Acoustic event detection and classication in smart-room environments: Evaluation of CHIL project systems. In Pro-
ceeding of The IV Biennial Workshop on Speech Technology, Zaragoza, Spain, 2006.
[8] Selina Chu, Shrikanth Narayanan, C.-c. Kuo, and Maja Mataric. Where am I? scene recognition for mobile robots using audio features. In Proceedings of 2006 IEEE International Conference on Multimedia and Expo, pages 885888. IEEE, 2006.
[9] Panagiotis Sidiropoulos, Vasileios Mezaris, Ioannis Kompatsiaris, Hugo Meinedo, Miguel Bugalho, and Isabel Trancoso. On the use of audio events for improving video scene segmentation. In Image Analysis for Multimedia
Interactive Services (WIAMIS), 2010 11th International Workshop on, pages 14. IEEE, 2010.
[10] Jia Ching Wang, Chang Hong Lin, Bo Wei Chen, and Min Kang Tsai. Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation. IEEE Transactions on Automation Science and Engineering, 11(2):607613, apr 2014.
[11] Jonathan William Dennis. Sound event recognition in unstructured environments using spectrogram image processing. Phd thesis, Nanyang Technological University, 2014.
[12] Jonathan William Dennis, Huy Dat Tran, and Haizhou Li. Spectrogram image feature for sound event Classification in mismatched conditions. IEEE Signal
Processing Letters, 18(2):130133, 2011.
[13] Takumi Kobayashi and Jiaxing Ye. Acoustic feature extraction by statistics based local binary pattern for environmental sound classication. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal
Processing, 2014.
[14] Ian McLoughlin, Haomin Zhang, Zhipeng Xie, Yan Song, and Wei Xiao. Robust sound event classication using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3):540552, 2015.
[15] Haomin Zhang, Ian McLoughlin, and Yan Song. Robust sound event recognition using convolutional neural networks. In Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 559563, April 2015.
[16] Jia Ching Wang, Jhing Fa Wang, Kuok Wai He, and Cheng Shu Hsu. Environmental sound classication using hybrid SVM/KNN classier and MPEG-7 audio low-level descriptor. In Proceeding of 2006 IEEE International Joint Conference on Neural Network, pages 17311735, Vancouver, BC, Canada, 2006. IEEE.
[17] G. Wichern, J. Xue, H. Thornburg, B. Mechtley, and A. Spanias. Segmentation, indexing, and retrieval for environmental and natural sounds. IEEE Transactions on Audio, Speech, and Language Processing, 18(3):688
707, March 2010.
[18] Rui Cai, Lie Lu, Alan Hanjalic, and Zhang Hong Jiang. A exible framework for key audio eects detection and auditory context inference. IEEE Trans. Audio, Speech, and Language Processing (TASLP), May 2006.
[19] J C Wang, Y S Lee, C H Lin, E Siahaan, and C H Yang. Robust environmental sound recognition with fast noise suppression for home Automation. IEEE Transactions on Automation Science and Engineering, 12(4):1235-1242, 2015.
[20] Sachin Chachada and C.-C. Jay Kuo. Environmental sound recognition: a survey. APSIPA Transactions on Signal and Information Processing, 3:e14, 2014.
[21] Jia Ching Wang, Hsiao Ping Lee, Jhing Fa Wang, and Cai Bei Lin. Robust environmental sound recognition for home automation. IEEE Transactions on Automation Science and Engineering, 5(1):2531, 2008.
[22] Mingming Zhang, Weifeng Li, Longbiao Wang, Jianguo Wei, Zhiyong Wu, and Qingmin Liao. Sparse coding for sound event classication. In Proceeding of 2013 Asia-Pacic Signal and Information Processing Association Annual Summit and Conference, number 3, pages 15, 2013.
[23] P. K. Atrey, N. C. Maddage, and M. S. Kankanhalli. Audio based event detection for multimedia surveillance. 5:VV, May 2006.
[24] T. Heittola, A. Mesaros, T. Virtanen, and M. Gabbouj. Supervised model training for overlapping sound events based on unsupervised source separation. In Proceeding of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 86778681, 2013.
[25] Georey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):8297, 2012.
[26] Ruhi Sarikaya, Georey E. Hinton, and Anoop Deoras. Application of deep belief networks for natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4):778784, 2014.
[27] Omid Ghahabi and Javier Hernando. Deep belief networks for i-vector based speaker recognition. In Proceeding of 2014 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), number 1, pages 1700-1704, 2014.
[28] E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen. Polyphonic sound event detection using multi label deep neural networks. In Proceeding of 2015 International Joint Conference on Neural Networks (IJCNN), pages 17, 2015.
[29] A. L. Berenzweig and D. P. W. Ellis. Locating singing voice segments within music signals. In Applications of Signal Processing to Audio and Acoustics,
2001 IEEE Workshop on the, pages 119122, 2001.
[30] Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. Recurrent neural networks for polyphonic sound event detection in real life recordings. In Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
[31] P. Arbeláez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. Semantic segmentation using regions and parts. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[32] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. European Conference on Computer Vision (ECCV), 2012.
[33] B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. European Conference on Computer Vision (ECCV), 2014.
[34] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[35] J. Dai, K. He, and J. Sun. Convolutional feature masking for joint object and stu segmentation. arXiv: 1412.1283, 2014.
[36] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis
and Machine Intelligence (PAMI), 2015.
[37] J. Dai, K. He, and J. Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. IEEE International Conference on Computer Vision (ICCV), 2015.
[38] G. Papandreou, L. Chen, K. Murphy, and A. Yuille. Weakly-and semisupervised learning of a deep convolutional network for semantic image segmentation.
IEEE International Conference on Computer Vision (ICCV), 2015.
[39] Q. Huang, C. Xia, W. Zheng, Y. Song, H. Xu, and C. Kuo. Object boundary guided semantic segmentation. arXiv:1603.09742v4, 2016.
[40] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR), 2015.
[41] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. International Journal of Computer Vision (IJCV), 2013.
[42] J. Pont-Tuset, P. Arbelaez, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping for image segmentation and object proposal generation. arXiv:1503.00848, 2015.
[43] P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping. Computer Vision and Pattern Recognition (CVPR), 2014.
[44] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr. Conditional random elds as recurrent neural networks. IEEE International Conference on Computer Vision (ICCV), 2015.
[45] L. Chen, J. Barron, G. Papandreou, K. Murphy, and A. Yuille. Semantic image segmentation with task-specic edge detection using cnns and a discriminatively trained domain transform. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2016.
[46] Jerome Buhl, David JT Sumpter, Iain D Couzin, Joe J Hale, Emma Despland, ER Miller, and Steve J Simpson. From disorder to order in marching locusts. Science, 312(5778):14021406, 2006.
[47] Nicholas C Makris, Purnima Ratilal, Deanelle T Symonds, Srinivasan Jagannathan, Sunwoong Lee, and Redwood W Nero. Fish population and behavior revealed by instantaneous continental shelf-scale imaging. Science,
311(5761):660663, 2006.
[48] Nicholas C Makris, Purnima Ratilal, Srinivasan Jagannathan, Zheng Gong, Mark Andrews, Ioannis Bertsatos, Olav Rune Godø, Redwood W Nero, and J Michael Jech. Critical population density triggers rapid formation of vast oceanic sh shoals. Science, 323(5922):17341737, 2009.
[49] Shuai Yi, Hongsheng Li, and Xiaogang Wang. Understanding pedestrian behaviors from stationary crowd groups. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3488-3496, 2015.
[50] Jing Shao, Chen C Loy, Kai Kang, and Xiaogang Wang. Crowded scene understanding by deeply learned volumetric slices. IEEE Transactions on Circuits and Systems for Video Technology, 2016.
[51] Shuai Yi, Xiaogang Wang, Cewu Lu, Jiaya Jia, and Hongsheng Li. L0 regularized stationary-time estimation for crowd analysis. IEEE transactions on pattern analysis and machine intelligence, 2016.
[52] Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. Cross-scene crowd counting via deep convolutional neural networks. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 833-841, 2015.
[53] Lokesh Boominathan, Srinivas SS Kruthiventi, and R Venkatesh Babu. Crowdnet: A deep convolutional network for dense crowd counting. In Proceedings of the 2016 ACM on Multimedia Conference, pages 640644. ACM, 2016.
[54] Carlos Arteta, Victor Lempitsky, and Andrew Zisserman. Counting in the wild. In European Conference on Computer Vision, pages 483498. Springer, 2016.
[55] Zheng Ma and Antoni B Chan. Counting people crossing a line using integer programming and local features. IEEE Transactions on Circuits and Systems for Video Technology, 26(10):1955-1969, 2016.
[56] Zhuoyi Zhao, Hongsheng Li, Rui Zhao, and Xiaogang Wang. Crossing-line crowd counting with two-phase deep neural networks. In European Conference on Computer Vision, pages 712726. Springer, 2016.
[57] Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. You′ll never walk alone: Modeling social behavior for multi-target tracking. In Computer Vision, 2009 IEEE 12th International Conference on, pages 261-268. IEEE, 2009.
[58] Anton Milan, Laura Leal-Taixé, Konrad Schindler, and Ian Reid. Joint tracking and segmentation of multiple targets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 53975406, 2015.
[59] Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. Multiperson tracking by multicut and deep matching. In European Conference on Computer Vision, pages 100111. Springer, 2016.
[60] Shuai Yi, Hongsheng Li, and Xiaogang Wang. Pedestrian travel time estimation in crowded scenes. In Proceedings of the IEEE International Conference on Computer Vision, pages 3137-3145, 2015.
[61] Shuai Yi, Hongsheng Li, and Xiaogang Wang. Pedestrian behavior understanding and prediction with deep neural networks. In European Conference on Computer Vision, pages 263279. Springer, 2016.
[62] Jing Shao, Chen-Change Loy, Kai Kang, and Xiaogang Wang. Slicing convolutional neural network for crowd video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5620-5628, 2016.
[63] Shuai Yi, Hongsheng Li, and Xiaogang Wang. Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. IEEE transactions on image processing, 25(9):4354-4368, 2016.
[64] Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. Coherent ltering: Detecting coherent motions from crowd clutters. In Computer VisionECCV 2012, pages 857-871. Springer, 2012.
[65] Jing Shao, Chen Change Loy, and Xiaogang Wang. Scene-independent group proling in crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2219-2226, 2014.
[66] Jing Shao, Chen Change Loy, and Xiaogang Wang. Learning sceneindependent group descriptors for crowd understanding. IEEE Transactions on Circuits and Systems for Video Technology, 2016.
[67] Weina Ge, Robert T Collins, and R Barry Ruback. Vision-based analysis of small groups in pedestrian crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5):10031016, 2012.
[68] Arun Kumar Chandran, Loh Ai Poh, and Prahlad Vadakkepat. Identifying social groups in pedestrian crowd videos. In Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on, pages 1-6. IEEE, 2015.
[69] Francesco Solera, Simone Calderara, and Rita Cucchiara. Socially constrained structural learning for groups detection in crowd. IEEE transactions on pattern
analysis and machine intelligence, 38(5):995-1008, 2016.
[70] P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. ACM International Conference on Multimedia,
2007.
[71] A. Klaser, M. Marcin, and S. Cordelia. A spatio-temporal descriptor based on 3d-gradients. British Machine Vision Conference, 2008.
[72] G. Willems, T. Tinne, and V. Luc. An ecient dense and scale-invariant spatio-temporal interest point detector. European Conference on Computer Vision, 2008.
[73] B. Nair and V. Asari. Regression based learning of human actions from video using HOF-LBP ow patterns. IEEE International Conference on Systems, 2013.
[74] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv:1502.03044, 2015.
[75] C. Ding and D. Tao. Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia, 2015.
[76] L. Pigou, S. Dieleman, P. Kindermans, and B. Schrauwen. Sign language recognition using convolutional neural networks. Workshop at the European Conference on Computer Vision, 2014.
[77] S. Sukittanon, A. Surendran, J. Platt, and C. Burges. Convolutional networks for speech detection. Interspeec, 2004.
[78] O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language pprocessing, 2014.
[79] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. European Conference on Computer Vision, 2014.
[80] C. Szegedy, Y. Jia W. Liu, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[81] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[82] J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. arXiv:1511.07571, 2015.
[83] J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[84] V. Mnih, H. Nicolas, and G. Alex. Recurrent models of visual attention. Advances in Neural Information Processing Systems, 2014.
[85] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504 507, 2006.
[86] Ruslan Salakhutdinov and Geoffrey Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448455, 2009.
[87] G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions
on Audio, Speech, and Language Processing, 20(1):3042, Jan 2012.
[88] Abdel rahman Mohamed, George Dahl, and Georey Hinton. Deep belief networks for phone recognition. In Proceedings of the NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009.
[89] Y. Lecun, L. Bottou, Y. Bengio, and P. Haner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):22782324, November 1998.
[90] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verication. In Proceesdings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1701-1708, June 2014.
[91] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In 2014 IEEE Conference
on Computer Vision and Pattern Recognition Workshops, pages 512-519, June 2014.
[92] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In
2014 IEEE Conference on Computer Vision and Pattern Recognition , pages 1725-1732, June 2014.
[93] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges,L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 10971105. Curran Associates, Inc., 2012.
[94] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. CoRR, abs/1312.4400, 2013.
[95] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):17351780, November 1997.
[96] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR, abs/1406.1078, 2014.
[97] Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Nishiura Takanobu, and Yamada Takeshi. Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition. In Proceeding of International Conference on Language Resources & Evaluation, pages 2-5, 2000.
[98] Jonathan Dennis, Huy Dat Tran, and Eng Siong Chng. Image feature representation of the subband power distribution for robust sound event classication.
IEEE Transactions on Audio, Speech, and Language Processing, 21(2):367-377, February 2013.
[99] Alex Krizhevsky, Sutskever Ilya, and Georey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of Neural Information Processing Systems (NIPS), 2012.
[100] Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. Neural codes for image retrieval. In Proceedings of European Conference on Computer Vision (ECCV), pages 584-599, 2014.
[101] T. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12):2037-2041, Dec 2006.
[102] Taishih Chi, Powen Ru, and Shihab A Shamma. Multiresolution spectrotemporal analysis of complex sounds. The Journal of the Acoustical Society of
America, 118(2):887, 2005.
[103] Douglas O′Shaughnessy. Speech communication: human and machine. Addison-Wesley, 1987.
[104] Timo Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classication with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971987, jul 2002.
[105] Wang Y, Yang J A amf Lu J, Liu H, and Wang L W. Hierarchical deep belief networks based point process model for keywords spotting in continuous speech. In International Journal of Communication Systems Volume 28, Issue 3, pages 483-496, February 2015, 2015.
[106] Re Fan, Kw Chang, and Cj Hsieh. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning, 9(2008):18711874, 2008.
[107] Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Tut database for acoustic scene classication and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European, pages 11281132. IEEE, 2016.
[108] Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, and Mark D Plumbley. Detection and classication of acoustic scenes and events. IEEE Transactions on Multimedia, 17(10):1733-1746, 2015.
[109] Laurens Van Der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:25792605, 2008.
[110] Jiaxing Ye, Takumi Kobayashi, Masahiro Murakawa, and Tetsuya Higuchi. Robust acoustic feature extraction for sound classication based on noise reduction. In Proceeding of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 59445948, 2014.
[111] Jiaxing Ye, Takumi Kobayashi, Masahiro Murakawa, and Tetsuya Higuchi. Kernel discriminant analysis for environmental sound recognition based on acoustic subspace. In Proceeding of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 808-812, 2013.
[112] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:33713408, 2010.
[113] Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, and Tuomas Virtanen. Sound event detection in multichannel audio using spatial and harmonic features. Technical report, DCASE2016 Challenge, September 2016.
[114] Toni Heittola, Annamaria Mesaros, and Tuomas Virtanen. DCASE2016 baseline system. Technical report, DCASE2016 Challenge, September 2016.
[115] Matthias Zöhrer and Franz Pernkopf. Gated recurrent networks applied to acoustic scene classication and acoustic event detection. Technical report, DCASE2016 Challenge, September 2016.
[116] Toan H. Vu and Jia-Ching Wang. Acoustic scene and event recognition using recurrent neural networks. Technical report, DCASE2016 Challenge, September 2016.
[117] R. Haralick, S. Sternberg, and X. Zhuang. Image analysis using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987.
[118] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915, 2016.
[119] L. Van Gool M. Everingham and, C. K. Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. IJCV, 2010.
[120] S. Chandra and I. Kokkinos. Fast, exact and multi-scale inference for semantic image segmentation with deep gaussian crfs. arXiv:1603.08358v2, 2016.
[121] G. Ghiasi and C. Fowlkes. Laplacian pyramid reconstruction and renement for semantic segmentation. arXiv:1605.02264v2, 2016.
[122] Z. Wu, C. Shen, and A. Hengel. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339v1, 2016.
[123] H. Wang, A. Klaser, C. Schmid, and C. Liu. Action recognition by dense trajectories. IEEE Conference on Computer Vision and Pattern Recognition, 2011.
[124] H. Wang and C. Schmid. Action recognition with improved trajectories. IEEE International Conference on Computer Vision, 2013.
[125] H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. IEEE International Conference on Computer Vision, 2007.
[126] P. Wang, Y. Cao, C. Shen, L. Liu, and H. Shen. Temporal pyramid pooling based convolutional neural networks for action recognition. arXiv:1503.01224,
2015.
[127] J. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classication. IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[128] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping. European Conference on Computer Vision, 2004.
[129] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition, 2014.
[130] J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos in the wild. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[131] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F. F. Li. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis., 2015.
[132] S. Sharma, R. Kiros, and R. Salakhutdinov. Action recognition using visual attention. arXiv: 1511.04119, 2015.
[133] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 14401448, 2015.
[134] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances
in neural information processing systems, pages 9199, 2015.
[135] Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International journal of
computer vision, 104(2):154-171, 2013.
[136] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211-252, 2015. |