|| L. Lin, K. Wang, W. Zuo, M. Wang, J. Luo, and L. Zhang, “A deep structured model with radius–margin bound for 3D human activity recognition,” International Journal of Computer Vision, 1-18, 2015.|
 S. Ji, W. Xu, M. Yang and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, Jan, 2013.
 A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, “Large-Scale Video Classification with Convolutional Neural Networks,” IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp. 1725-1732, 2014.
 L. Pigou, A. Oord, S. Dieleman, M. Herreweghe, and J. Dambre, “Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video,” arXiv preprint arXiv:1506.01911, 2015.
 W. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.
 D. Hebb, “The Organization of Behavior: A Neuropsychological Theory,” New York: Wiley, 1949.
 F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychological Review, Vol 65(6), Nov 1958, 386-408.
 M. Minsky, S. Papert, “Perceptrons,” M.I.T. Press Perceptrons, 1969
 D. Rumelhart, G. Hinton, and R. Williams, “Learning representations by back-propagating errors,” Neurocomputing: foundations of research, James A. Anderson and Edward Rosenfeld (Eds.). MIT Press, Cambridge, MA, USA 696-699, 1988.
 M. Minsky and S. Papert, “Perceptrons: Expanded Edition,” MIT Press, Cambridge, MA, USA, 1988.
 D. Rumelhart, G. Hinton, R. Williams, “Learning Internal Representations by Error Propagation” Technical rept., Mar-Sep, 1985.
 G. Hinton, S. Osindero, Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets” Neural computation, Vol. 18, No. 7, Pages 1527-1554, 2006.
 G. Hinton, R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks.” Science, Vol. 313, Issue 5786, pp. 504-507, 2006.
 Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov 1998.
 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 1-9, 2015.
 P. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, Oct 1990.
 I. Sutskever, O. Vinyals, and Q. Le. “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, 2014.
 K. Cho, B. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk , Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
 R. O’Reilly, “Biologically Plausible Error-driven Learning using Local Activation Differences:The Generalized Recirculation Algorithm,” Neural Computation, 8:5, 895-938, 1996.
 D. Ciresan, A. Giusti, L. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” Advances in neural information processing systems, 2012.
 A. Karpathy and L. Fei-Fei. “Deep visual-semantic alignments for generating image descriptions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
 W. Byeon, T. Breuel, F. Raue, and M. Liwicki, “Scene labeling with lstm recurrent neural networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
 R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
 O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
 J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
 D. Bahdanau, K. Cho, and Y. Bengio. “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
 R. Girshick, “Fast r-cnn.” Proceedings of the IEEE International Conference on Computer Vision, 2015.
 S. Ren, K. He, R. Girshick, and J.Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, 2015.
 S. Sharma, R. Kiros, and R. Salakhutdinov, “Action recognition using visual attention,” arXiv preprint arXiv:1511.04119, 2015.
 K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention.” arXiv preprint arXiv:1502.03044, 2015.
 T. Brox, A. Bruhn, N. Papenberg, J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” Computer Vision-ECCV 2004, Springer Berlin Heidelberg, pp. 25-36, 2004.
 S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no.8, pp. 1735-1780, 1997.
 M. Schuster and K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol.45, no.11, pp. 2673-2681 , 1997.
 J. Chung, C. Gulcehre, K. Cho, amd Y. Bengio “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
 J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on IEEE, 2009.
 C. Ding and D. Tao, “Robust face recognition via multimodal deep face representation,” IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 2049-2058, 2015.
 L. Pigou, S. Dieleman, P. Kindermans, and B. Schrauwen, “Sign language recognition using convolutional neural networks,” Workshop at the European Conference on Computer Vision, Springer International Publishing, 2014.
 S. Sukittanon, A. Surendran, J. Platt, and C. Burges, “Convolutional networks for speech detection,” Interspeech, 2004.
 O. Abdel-Hamid, A. Mohamed, H. Jiang, Li Deng, G. Penn, and D. Yu “Convolutional neural networks for speech recognition,” IEEE/ACM Transactions on audio, speech, and language processing, vol. 22, no. 10, pp. 1533-1545, 2014.
 Y. Wang and D. Wang “Cocktail party processing via structured prediction,” Advances in Neural Information Processing System, 2012.
 Y. Wang and D. Wang, “Towards scaling up classification-based speech separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol.21, no.7, pp. 1381-1390, 2013.
 D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
 Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
 D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber, “Mitosis detection in breast cancer histology images with deep neural networks,” International Conference on Medical Image Computing and Computer-assisted Intervention, Springer Berlin Heidelberg, 2013.
 G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol.29, no.6, pp. 82-97, 2012.
 J. Li, Y. Wei, X. Liang, J. Dong, T. Xu, J. Feng, and S. Yan, “Attentive Contexts for Object Detection,” arXiv preprint arXiv:1603.07415, 2016.
 J. Johnson, A. Karpathy, L. Fei-Fei, “Densecap: Fully convolutional localization networks for dense captioning,” arXiv preprint arXiv:1511.07571, 2015.
 K. He, X. Zhang, S. Ren, J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition," European Conference on Computer Vision. Springer International Publishing, 2014.
 P. Wang, Y. Cao, C. Shen, L. Liu, H. Shen, “Temporal pyramid pooling based convolutional neural networks for action recognition,” arXiv preprint arXiv:1503.01224, 2015.
 J. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, G. Toderici, “Beyond short snippets: Deep networks for video classification,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
 H. Lee and H. Kwon, “Contextual Deep CNN Based Hyperspectral Classification,” arXiv preprint arXiv:1604.03519, 2016.
 P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” Proceedings of the 15th ACM international conference on Multimedia, ACM, 2007.
 A. Klaser, M. Marcin, and S. Cordelia, “A spatio-temporal descriptor based on 3d-gradients,” BMVC 2008-19th British Machine Vision Conference, British Machine Vision Association, 2008.
B. Nair and V. Asari, “Regression Based Learning of Human Actions from Video Using HOF-LBP Flow Patterns,” IEEE International Conference on Systems, Man, and Cybernetics, Manchester, pp. 4342-4347, 2013.
C. Chen, R. Jafari, and N. Kehtarnavaz, “Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns,” IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, pp. 1092-1099, 2015.
N. Ikizler-Cinbis and S. Sclaroff, “Object, scene and actions: Combining multiple features for human action recognition,” European conference on computer vision, Springer Berlin Heidelberg, 2010.
 J. Cho, M. Lee, and S.Oh, “Robust action recognition using local motion and group sparsity,” Pattern Recognition, vol. 47, no. 5, 1813-1825, 2014.