參考文獻 |
[1] Z. Wu, T. Yao, Y. Fu, and Y.-G. Jiang, “Deep Learning for Video Classification and Captioning,” ArXiv160906782 Cs, pp. 3–29, Dec. 2017, doi: 10.1145/3122865.3122867.
[2] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning Hierarchical Features for Scene Labeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013, doi: 10.1109/TPAMI.2012.231.
[3] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ArXiv14091556 Cs, Sep. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1409.1556.
[4] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 221–231, Jan. 2013, doi: 10.1109/TPAMI.2012.59.
[5] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 157–166, Mar. 1994, doi: 10.1109/72.279181.
[6] K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 568–576.
[7] H. Wang and C. Schmid, “Action Recognition with Improved Trajectories,” in 2013 IEEE International Conference on Computer Vision, Dec. 2013, pp. 3551–3558, doi: 10.1109/ICCV.2013.441.
[8] J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond Short Snippets: Deep Networks for Video Classification,” ArXiv150308909 Cs, Mar. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1503.08909.
[9] Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue, “Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification,” ArXiv150401561 Cs, Apr. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1504.01561.
[10] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1010–1019, Accessed: Jun. 13, 2020. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2016/html/Shahroudy_NTU_RGBD_A_CVPR_2016_paper.html.
[11] B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, “A Multi-stream Bi-directional Recurrent Neural Network for Fine-Grained Action Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 1961–1970, doi: 10.1109/CVPR.2016.216.
[12] L. Pigou, A. van den Oord, S. Dieleman, M. Van Herreweghe, and J. Dambre, “Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video,” ArXiv150601911 Cs Stat, Feb. 2016, Accessed: Jan. 18, 2020. [Online]. Available: http://arxiv.org/abs/1506.01911.
[13] Yong Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 1110–1118, doi: 10.1109/CVPR.2015.7298714.
[14] V. Veeriah, N. Zhuang, and G.-J. Qi, “Differential Recurrent Neural Networks for Action Recognition,” ArXiv150406678 Cs, Apr. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1504.06678.
[15] P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz, “Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 4207–4215, doi: 10.1109/CVPR.2016.456.
[16] N. L. Hakim, T. K. Shih, S. P. Kasthuri Arachchi, W. Aditya, Y.-C. Chen, and C.-Y. Lin, “Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model,” Sensors, vol. 19, no. 24, p. 5429, Jan. 2020, doi: 10.3390/s19245429.
[17] S. Abu-El-Haija et al., “YouTube-8M: A Large-Scale Video Classification Benchmark,” ArXiv160908675 Cs, Sep. 2016, Accessed: Apr. 28, 2020. [Online]. Available: http://arxiv.org/abs/1609.08675.
[18] Y. Jiang, Z. Wu, J. Wang, X. Xue, and S. Chang, “Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 2, pp. 352–364, Feb. 2018, doi: 10.1109/TPAMI.2017.2670560.
[19] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958, doi: 10.1037/h0042519.
[20] “Rectifier (neural networks),” Wikipedia. Dec. 04, 2018, Accessed: Jan. 21, 2020. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Rectifier_(neural_networks)&oldid=871884348.
[21] “Introduction to Artificial Neural Networks - Part 1.” http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7 (accessed Jan. 21, 2020).
[22] F. M. Soares and A. M. F. Souza, Neural Network Programming with Java. Packt Publishing Ltd, 2017.
[23] “Receptive fields and functional architecture of monkey striate cortex - Hubel - 1968 - The Journal of Physiology - Wiley Online Library.” https://physoc.onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.1968.sp008455 (accessed Jan. 21, 2020).
[24] Y. LeCun et al., “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.
[25] D. C. Cireşan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Flexible, High Performance Convolutional Neural Networks for Image Classification,” in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Two, Barcelona, Catalonia, Spain, 2011, pp. 1237–1242, doi: 10.5591/978-1-57735-516-8/IJCAI11-210.
[26] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale Video Classification with Convolutional Neural Networks,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1725–1732, Accessed: Jan. 21, 2020. [Online]. Available: https://www.cv-foundation.org/openaccess/content_cvpr_2014/html/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.html.
[27] D. Britz, “Understanding Convolutional Neural Networks for NLP,” WildML, Nov. 07, 2015. http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ (accessed Jan. 21, 2020).
[28] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998, doi: 10.1109/5.726791.
[29] “CS231n Convolutional Neural Networks for Visual Recognition.” http://cs231n.github.io/convolutional-networks/ (accessed Jan. 21, 2020).
[30] M. Ranzato, “Large-Scale Visual Recognition With Deep Learning,” p. 134.
[31] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.
[32] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” in Computer Vision – ECCV 2014, 2014, pp. 818–833.
[33] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015, doi: 10.1007/s11263-015-0816-y.
[34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778, Accessed: Jan. 21, 2020. [Online]. Available: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html.
[35] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution,” in Computer Vision – ECCV 2016, 2016, pp. 694–711.
[36] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
[37] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: continual prediction with LSTM,” pp. 850–855, Jan. 1999, doi: 10.1049/cp:19991218.
[38] C. Metz, “With QuickType, Apple wants to do more than guess your next text. It wants to give you an AI.,” Wired, Jun. 14, 2016.
[39] “A Beginner’s Guide to LSTMs and Recurrent Neural Networks,” Skymind. http://skymind.ai/wiki/lstm (accessed Jan. 21, 2020).
[40] “Nikhil Buduma | A Deep Dive into Recurrent Neural Nets,” The Musings of Nikhil Buduma. http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-networks/ (accessed Jun. 13, 2020).
[41] F. A. Gers and J. Schmidhuber, “Recurrent nets that time and count,” in Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Jul. 2000, vol. 3, pp. 189–194 vol.3, doi: 10.1109/IJCNN.2000.861302.
[42] K. Yao, T. Cohn, K. Vylomova, K. Duh, and C. Dyer, “Depth-Gated LSTM,” ArXiv150803790 Cs, Aug. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1508.03790.
[43] J. Koutník, K. Greff, F. Gomez, and J. Schmidhuber, “A Clockwork RNN,” ArXiv14023511 Cs, Feb. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1402.3511.
[44] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A Search Space Odyssey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017, doi: 10.1109/TNNLS.2016.2582924.
[45] R. Jozefowicz, W. Zaremba, and I. Sutskever, “An Empirical Exploration of Recurrent Network Architectures,” p. 9.
[46] B. Krause, L. Lu, I. Murray, and S. Renals, “Multiplicative LSTM for sequence modelling,” ArXiv160907959 Cs Stat, Oct. 2017, Accessed: Jun. 13, 2020. [Online]. Available: http://arxiv.org/abs/1609.07959.
[47] Y. Wu et al., “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” ArXiv160908144 Cs, Oct. 2016, Accessed: Jun. 13, 2020. [Online]. Available: http://arxiv.org/abs/1609.08144.
[48] A. Graves, S. Fernández, and J. Schmidhuber, “Multi-dimensional Recurrent Neural Networks,” in Artificial Neural Networks – ICANN 2007, 2007, pp. 549–558.
[49] M. F. Stollenga, W. Byeon, M. Liwicki, and J. Schmidhuber, “Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 2998–3006.
[50] N. Kalchbrenner, I. Danihelka, and A. Graves, “Grid Long Short-Term Memory,” ArXiv150701526 Cs, Jul. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1507.01526.
[51] M. Cord and P. Cunningham, Eds., Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval. Berlin Heidelberg: Springer-Verlag, 2008.
[52] O. Bousquet, U. von Luxburg, and G. Ratsch, Advanced Lectures On Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tubingen, Germany, August 4-16, 2003, Revised Lectures (Lecture Notes in Computer Science). SpringerVerlag, 2004.
[53] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Mar. 2010, pp. 249–256, Accessed: Jan. 21, 2020. [Online]. Available: http://proceedings.mlr.press/v9/glorot10a.html.
[54] H. Robbins and S. Monro, “A Stochastic Approximation Method,” Ann. Math. Stat., vol. 22, no. 3, pp. 400–407, Sep. 1951, doi: 10.1214/aoms/1177729586.
[55] Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio, “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2933–2941.
[56] B. T. Polyak, “Some methods of speeding up the convergence of iteration methods,” USSR Comput. Math. Math. Phys., vol. 4, no. 5, pp. 1–17, Jan. 1964, doi: 10.1016/0041-5553(64)90137-5.
[57] S. Ruder, “An overview of gradient descent optimization algorithms,” ArXiv160904747 Cs, Jun. 2017, Accessed: Jun. 13, 2020. [Online]. Available: http://arxiv.org/abs/1609.04747.
[58] N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Netw. Off. J. Int. Neural Netw. Soc., vol. 12, no. 1, pp. 145–151, Jan. 1999.
[59] Y. NESTEROV, “A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2),” Dokl. USSR, vol. 269, pp. 543–547, 1983, Accessed: Jan. 21, 2020. [Online]. Available: https://ci.nii.ac.jp/naid/20001173129/.
[60] Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, “Advances in optimizing recurrent networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 8624–8628, doi: 10.1109/ICASSP.2013.6639349.
[61] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” J. Mach. Learn. Res., vol. 12, no. Jul, pp. 2121–2159, 2011, Accessed: Jan. 21, 2020. [Online]. Available: http://www.jmlr.org/papers/v12/duchi11a.html.
[62] J. Dean et al., “Large Scale Distributed Deep Networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1223–1231.
[63] J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1532–1543, Accessed: Jan. 21, 2020. [Online]. Available: http://www.aclweb.org/anthology/D14-1162.
[64] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” ArXiv12125701 Cs, Dec. 2012, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1212.5701.
[65] V. Bushaev, “Understanding RMSprop — faster neural network learning,” Towards Data Science, Sep. 02, 2018. https://towardsdatascience.com/understanding-rmsprop-faster-neural-network-learning-62e116fcf29a (accessed Jan. 21, 2020).
[66] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” ArXiv14126980 Cs, Dec. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1412.6980.
[67] Z. Zhang, L. Ma, Z. Li, and C. Wu, “Normalized Direction-preserving Adam,” ArXiv170904546 Cs Stat, Sep. 2018, Accessed: Jun. 14, 2020. [Online]. Available: http://arxiv.org/abs/1709.04546.
[68] V. Bushaev, “Adam — latest trends in deep learning optimization.,” Medium, Oct. 24, 2018. https://towardsdatascience.com/adam-latest-trends-in-deep-learning-optimization-6be9a291375c (accessed Jun. 14, 2020).
[69] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://jmlr.org/papers/v15/srivastava14a.html.
[14] J. Bayer, C. Osendorfer, D. Korhammer, N. Chen, S. Urban, and P. van der Smagt, “On Fast Dropout and its Applicability to Recurrent Networks,” ArXiv13110701 Cs Stat, Nov. 2013, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1311.0701.
[70] V. Pham, T. Bluche, C. Kermorvant, and J. Louradour, “Dropout improves Recurrent Neural Networks for Handwriting Recognition,” ArXiv13124569 Cs, Nov. 2013, Accessed: Jan. 21, 2019. [Online]. Available: http://arxiv.org/abs/1312.4569.
[71] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent Neural Network Regularization,” ArXiv14092329 Cs, Sep. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1409.2329.
[72] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” ArXiv150203167 Cs, Feb. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1502.03167.
[73] T. Cooijmans, N. Ballas, C. Laurent, Ç. Gülçehre, and A. Courville, “Recurrent Batch Normalization,” ArXiv160309025 Cs, Mar. 2016, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1603.09025.
[74] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” ArXiv13112524 Cs, Nov. 2013, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1311.2524.
[75] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN Features off-the-shelf: an Astounding Baseline for Recognition,” ArXiv14036382 Cs, Mar. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1403.6382.
[76] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp. 248–255, doi: 10.1109/CVPR.2009.5206848.
[77] C. Szegedy et al., “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 1–9, doi: 10.1109/CVPR.2015.7298594.
[78] S. Zha, F. Luisier, W. Andrews, N. Srivastava, and R. Salakhutdinov, “Exploiting Image-trained CNN Architectures for Unconstrained Video Classification,” ArXiv150304144 Cs, Mar. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1503.04144.
[79] X. Alameda-Pineda et al., “RAVEL: an annotated corpus for training robots with audiovisual abilities,” J. Multimodal User Interfaces, vol. 7, no. 1, pp. 79–91, Mar. 2013, doi: 10.1007/s12193-012-0111-y.
[80] Z. Xu, Y. Yang, and A. G. Hauptmann, “A Discriminative CNN Video Representation for Event Detection,” ArXiv14114006 Cs, Nov. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1411.4006.
[81] H. Jégou, M. Douze, C. Schmid, and P. Pérez, “Aggregating local descriptors into a compact image representation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp. 3304–3311, doi: 10.1109/CVPR.2010.5540039.
[82] Q. Li, Z. Qiu, T. Yao, T. Mei, Y. Rui, and J. Luo, “Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation,” in Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 2016, pp. 159–166, doi: 10.1145/2911996.2912001.
[83] J. Donahue et al., “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2625–2634, Accessed: Jan. 21, 2020. [Online]. Available: https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.html.
[84] L. Yao et al., “Describing Videos by Exploiting Temporal Structure,” ArXiv150208029 Cs Stat, Feb. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1502.08029.
[85] A. Graves, A. Mohamed, and G. Hinton, “Speech Recognition with Deep Recurrent Neural Networks,” ArXiv13035778 Cs, Mar. 2013, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1303.5778.
[86] Y. Jia et al., “Caffe: Convolutional Architecture for Fast Feature Embedding,” ArXiv14085093 Cs, Jun. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1408.5093.
[87] “[1412.3555] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” https://arxiv.org/abs/1412.3555 (accessed Jan. 21, 2020).
[88] N. Léonard, S. Waghmare, Y. Wang, and J.-H. Kim, “rnn : Recurrent Library for Torch,” ArXiv151107889 Cs, Nov. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1511.07889.
[89] V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent Models of Visual Attention,” ArXiv14066247 Cs Stat, Jun. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1406.6247.
[90] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple Object Recognition with Visual Attention,” ArXiv14127755 Cs, Dec. 2014, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1412.7755.
[91] Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue, “Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification,” ArXiv150401561 Cs, Apr. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1504.01561.
[92] J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond Short Snippets: Deep Networks for Video Classification,” ArXiv150308909 Cs, Mar. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1503.08909.
[93] V. Veeriah, N. Zhuang, and G.-J. Qi, “Differential Recurrent Neural Networks for Action Recognition,” ArXiv150406678 Cs, Apr. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1504.06678.
[94] Z. Wu, Y.-G. Jiang, X. Wang, H. Ye, and X. Xue, “Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification,” in Proceedings of the 24th ACM International Conference on Multimedia, New York, NY, USA, 2016, pp. 791–800, doi: 10.1145/2964284.2964328.
[95] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 221–231, Jan. 2013, doi: 10.1109/TPAMI.2012.59.
[96] H. Wang and C. Schmid, “Action Recognition with Improved Trajectories,” in 2013 IEEE International Conference on Computer Vision, Dec. 2013, pp. 3551–3558, doi: 10.1109/ICCV.2013.441.
[97] L. Sun, K. Jia, D.-Y. Yeung, and B. E. Shi, “Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks,” ArXiv151000562 Cs, Oct. 2015, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1510.00562.
[98] K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 568–576.
[99] L. Wang, Y. Qiao, and X. Tang, “Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors,” 2015 IEEE Conf. Comput. Vis. Pattern Recognit. CVPR, pp. 4305–4314, Jun. 2015, doi: 10.1109/CVPR.2015.7299059.
[100] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional Two-Stream Network Fusion for Video Action Recognition,” ArXiv160406573 Cs, Apr. 2016, Accessed: Jan. 21, 2020. [Online]. Available: http://arxiv.org/abs/1604.06573.
[101] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks,” ArXiv14120767 Cs, Dec. 2014, Accessed: Apr. 28, 2020. [Online]. Available: http://arxiv.org/abs/1412.0767.
[102] “Home - Keras Documentation.” https://keras.io/ (accessed Jan. 18, 2020).
[103] “Understanding LSTM Networks -- colah’s blog.” http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed Apr. 28, 2020).
[104] V.-M. Khong and T.-H. Tran, “Improving Human Action Recognition with Two-Stream 3D Convolutional Neural Network,” in 2018 1st International Conference on Multimedia Analysis and Pattern Recognition (MAPR), Apr. 2018, pp. 1–6, doi: 10.1109/MAPR.2018.8337518.
[105] N. L. Hakim, T. K. Shih, S. P. Kasthuri Arachchi, W. Aditya, Y.-C. Chen, and C.-Y. Lin, “Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model,” Sensors, vol. 19, no. 24, p. 5429, Jan. 2020, doi: 10.3390/s19245429.
[106] H. Phan et al., “Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene?,” ArXiv181101095 Cs Eess, Nov. 2018, Accessed: Apr. 28, 2020. [Online]. Available: http://arxiv.org/abs/1811.01095.
[107] “Serre Lab » HMDB: a large human motion database.” http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/ (accessed Jan. 17, 2020).
[108] G. Farnebäck, “Two-Frame Motion Estimation Based on Polynomial Expansion,” in Image Analysis, 2003, pp. 363–370.
[109]“OpenCV:OpticalFlow.” https://docs.opencv.org/3.4/d7/d8b/tutorial_py_lucas_kanade.html (accessed Jan. 21, 2020).
[110] C. Igel and M. Hüsken, “Improving the Rprop Learning Algorithm,” 2000.
[112] S. P. K. Arachchi, T. K. Shih, C.-Y. Lin, and G. Wijayarathna, “Deep Learning-Based Firework Video Pattern Classification,” J. Internet Technol., vol. 20, no. 7, pp. 2033–2042, Dec. 2020, Accessed: Jan. 17, 2020. [Online]. Available: https://jit.ndhu.edu.tw/article/view/2190.
[113] C. Feichtenhofer, A. Pinz, and R. P. Wildes, “Spatiotemporal Residual Networks for Video Action Recognition,” ArXiv161102155 Cs, Nov. 2016, Accessed: Jan. 18, 2020. [Online]. Available: http://arxiv.org/abs/1611.02155. |