博碩士論文 103522605 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:10 、訪客IP:54.198.165.74
姓名 阮光輝(Nguyen Quang Huy)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 運用卷積神經網路方法分類動物動作
(Animal Action Classification using Convolutional Neural Networks)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    至系統瀏覽論文 (2019-1-20以後開放)
摘要(中) 在本文中,我們提出了一種用於學習動物活動視頻特徵的新方法。 據我們所知,由於動物之相關視頻較少,沒有任何先前的工作在討論此問題。 此論文的第一個貢獻在於創建一個全新的動物活動行為數據,分類動物之各種活動行為。 此數據之提供來源,不僅為寵物類型作為一些相關作品,而在類似活動中主要是分析之對象為貓和狗。 此外,數據自然視頻記錄存在各種天氣場景之條件,如不同之光線。 因此,為一種讓任何模型做預測和分類任務之挑戰。

我們的第二個貢獻是研究兩個深度學習模型來對我們自己已知數據集進行分類。 我們在Torch 7框架中實現兩個空間長短期存儲器架構LRCN和卷積LSTM。 我們的設計使網絡變得更方便適應。 我們的方法中,訓練我們關於動物行為數據集之架構,在訓練中發現他們具有有學習意義之能力,甚至能套用在一些困難的寵物活動行為之視頻,而 網絡在分類任務中能夠獲得相當高的精度約65-70%的結果。

我們認為動物行為數據蒐集和深度學習對於更進一步之相關研究探討是非常重要的。
摘要(英) In this dissertation, we proposed a new approach for learning features of animal activity videos. To our best knowledge, there have not any previous work on this problem due to the lack of animal videos. Our first contribution is creating a whole new animal action data which must be difficult to classify in several aspects. The data does not contain only one pet type as some related works, but has two types of cat and dog in similar activities. Furthermore, the data natural videos which are recorded in daily life in variety conditions of light, weather and scene. Thus, our data really challenge any model to do prediction and classification task.
Our second contribution is investigating abilities of two Deep Learning models on classifying our own dataset. We implement two spatial Long Short-term Memory architectures LRCN and Convolutional-LSTM in Torch 7 framework. Our design makes adapting the networks easy and convenient. We trained our architectures on animal action dataset and discovered that they have potentials to learn meaning features even of difficult pet videos. The networks get a quite high result on classification task with 65-70% of accuracy.
We believe that the animal action dataset and Deep Learning are essential for further studying with more critical requirements.
關鍵字(中) ★ 運用卷積神經網路方法分類動物動作 關鍵字(英) ★ Animal Action Classification
★ Deep Learning
★ Convolutional LSTM
★ Long-term Recurrent Neural Networks
論文目次 中文摘要 i
Abstract ii
Acknowledgements iii
List of Symbols and Abbreviations iv
List of Figures ix
List of Tables xii
Chapter 1 Introduction 1
1.1. Introductions 1
1.2. Related Works 3
Chapter 2 Deep Learning Background 6
2.1. Full Connected Neural Network 6
2.2. Convolution Neural Network 8
2.2.1. Overview of Convolution Neural Network 8
2.2.2. Mathematics form of CNN: 11
2.2.3. Successful Convolution Network 13
2.3. Recurrent Neural Network 15
2.3.1. Overview of Recurrent Neural Network 15
2.3.2. The Problem of Long-Term Dependencies (Gradient Vanishing Problem) 16
2.3.3. LSTM Variants 19
2.3.4. Successful LSTM models 20
2.4. Back propagation 23
2.4.1. Training Neural Network by Back propagation 23
2.4.2. Optimization methods 28
2.5. Over-fitting 36
2.5.1. Regularization and Constrain 37
2.5.2. Dropout: 38
2.6. Batch Normalization: 43
Chapter 3 Proposed Method 45
3.1. Deep Learning Frameworks 45
3.2. Animal Action Dataset v1.0 47
3.3. Long-term Recurrent Convolution Network 51
3.3. Convolutional-LSTM 56
3.4. Mean Subtraction 58
3.5. Implementation 59
3.5.1. Data Loader: 60
3.5.2. Model 62
3.5.3. Training and Testing. 69
Chapter 4 Experiments and Results 74
Chapter 5 Conclusions and Future Works 78
References 80
參考文獻 [1] Hubel, D. and Wiesel, T. (1968) - Receptive fields and functional architecture of monkey striate cortex - Journal of Physiology (London), 195, 215–243.
[2] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner - Gradient-Based Learning Applied to Document Recognition - Proceedings of the IEEE (Volume: 86, Issue: 11, Nov 1998).
[3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton - ImageNet Classification with Deep Convolutional Neural Networks - Advances in Neural Information Processing Systems 25 (NIPS 2012).
[4] Kevin Jarrett, Koray Kavukcuoglu, and Yann Lecun - What is the best multi-stage architecture for object recognition? - IEEE 12th International Conference on Computer Vision (2009).
[5] Sepp Hochreiter and Jurgen Schmidhuber – Long Short-Term Memory – Journal Neural Computation (Volume 9 Issue 8, November 15, 1997).
[6] Alex Graves - Generating Sequences With Recurrent Neural Networks - https://arxiv.org/abs/1308.0850
[7] Felix A. Gers, Jurgen A. Schmidhuber, and Fred A. Cummins – Learning to Forget: Continual Prediction with LSTM – Journal Neural Computation (Volume 12 Issue 10, October 2000).
[8] Kyunghyun Cho, Caglar Gulcehre, Universite ? De Montre?al , Dzmitry Bahdanau , Fethi Bougares , Holger Schwenk , and Yoshua Bengio - Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation - International Conference on Empirical Methods in Natural Language Processing (2014)
[9] Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, and Chris Dyer - Depth-Gated Recurrent Neural Networks - Jelinek Summer Workshop on Speech and Language Technology (2015) https://arxiv.org/abs/1508.03790
[10] Jan Koutnik, Klaus Greff, Faustino Gomez, and Jurgen Schmidhuber - A Clockwork RNN - Journal of Machine Learning Research (Volume 32, 18th June 2014).
[11] Klaus Greff, Rupesh K. Srivastava, Jan Koutnik, Bas R. Steunebrink, and Jurgen Schmidhuber - LSTM: A Search Space Odyssey - IEEE Transactions on Neural Networks and Learning Systems (2016, Volume: PP, Issue: 99)
[12] Rafal Jozefowicz, Wojciceh Zaremba, and Ilya Sutskever - An Empirical Exploration of Recurrent Network Architectures - International Conference on Machine Learning (ICML 2015)
[13] Alex Graves, Santiago Fernandez, and Jurgen Schmidhuber - Multi-dimensional RNN - International Conference on Artificial Neural Networks (ICANN 2007)
[14] Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, and Juergen Schmidhuber - Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation - International Conference on Neural Information Processing Systems (NIPS 2015)
[15] Nal Kalchbrenner, Ivo Danihelka, and Alex Graves - Grid Long Short-Term Memory - https://arxiv.org/abs/1507.01526
[16] Mike Schuster and Kuldip K. Paliwal - Bidirectional Recurrent Neural Networks - Transaction on Signal Processing (1997).
[17] Jiwei Li, Minh-Thang Luong, and Dan Jurafsky - A Hierarchical Neural Autoencoder for Paragraphs and Documents - International Conference of the Association for Computational Linguistics (ACL 2015)
[18] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, and Richard S. Zemel - Skip-Thought Vectors - International Conference on Neural Information Processing Systems (NIPS 2015).
[19] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton - Speech Recognition with Deep Recurrent Neural Networks - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013)
[20] Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio - Attention-Based Models for Speech Recognition - International Conference on Neural Information Processing Systems (NIPS 2015)
[21] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le - Sequence to Sequence Learning with Neural Networks - International Conference on Neural Information Processing Systems (NIPS 2014).
[22] Oriol Vinyals and Quoc V. Le - A Neural Conversational Model - ICML Deep Learning Workshop 2015 [23] Karl M. Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom - Teaching Machines to Read and Comprehend - International Conference on Neural Information Processing Systems (NIPS 2015).
[24] Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston - Large-scale Simple Question Answering with Memory Networks - https://arxiv.org/abs/1506.02075
[25] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo J. Rezende, and Daan Wierstra - DRAW: A Recurrent Neural Network for Image Generation - International Conference on Machine Learning (ICML 2015).
[26] Lucas Theis and Matthias Bethge - Generative Image Modeling Using Spatial LSTMs - International Conference on Neural Information Processing Systems (NIPS 2015).
[27] Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov - Unsupervised Learning of Video Representations using LSTMs - International Conference on Machine Learning (ICML 2015).
[28] Stephen Boyd – Convex Optimization – Stanford Course https://lagunita.stanford.edu/courses/Engineering/CVX101/Winter2014/about
[29] Xavier Glorot, and Yoshua Bengio - Understanding the difficulty of training deep feedforward neural networks - International Conference on Artificial Intelligence and Statistics (AISTATS’10).
[30] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun - Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification - IEEE International Conference on Computer Vision (ICCV 2015).
[31] By Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun - Deep Residual Learning for Image Recognition - International Conference on Computer Vision and Pattern Recognition (CVPR 2016).
[32] H. Robinds and S. Monro - A stochastic approximation method - Annals of Mathematical Statistics (vol. 22, pp. 400–407, 1951)
[33] Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio - Identifying and attacking the saddle point problem in high-dimensional non-convex optimization – International Conference on Neural Information Processing Systems (NIPS 2014).
[34] Sutton, R. S. - Two problems with backpropagation and other steepest-descent learning procedures for networks - Proc. 8th Annual Conf. Cognitive Science Society.
[35] Qian, N. - On the momentum term in gradient descent learning algorithms. Neural Networks - The Official Journal of the International Neural Network Society, 12(1), 145–151. http://doi.org/10.1016/S0893-6080(98)00116-6
[36] Nesterov, Y. - A method for unconstrained convex minimization problem with the rate of convergence o(1/k2) - Doklady ANSSSR (translated as Soviet.Math.Docl.), vol. 269, pp. 543– 547.
[37] Yoshua Bengio, Nicolas Boulanger-Lewandowski, and Razvan Pascanu - Advances in Optimizing Recurrent Networks - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013)
[38] Duchi, J., Hazan, E., and Singer, Y. - Adaptive Subgradient Methods for Online Learning and Stochastic Optimization - Journal of Machine Learning Research, 12, 2121–2159.
[39] Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V, and Ng, A. Y. - Large Scale Distributed Deep Networks - Neural Information Processing Systems (NIPS 2012).
[40] Pennington, J., Socher, R., and Manning, C. D. - Glove: Global Vectors for Word Representation - Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.
[41] Zeiler, M. D. - ADADELTA: An Adaptive Learning Rate Method - http://arxiv.org/abs/1212.5701
[42] Geoffrey Hinton - Neural Networks for Machine Learning – https://www.coursera.org/learn/neural-networks
[43] Kingma, D. P., and Ba, J. L. - Adam: a Method for Stochastic Optimization - International Conference on Learning Representations (ICLR 2015).
[44] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov Dropout: A Simple Way to Prevent Neural Networks from Overfitting - Journal of Machine Learning Research (Volume 15 Issue 1, January 2014)
[45] Justin Bayer, Christian Osendorfer, Daniela Korhammer, Nutan Chen, Sebastian Urban, and Patrick van der Smagt - On Fast Dropout and its Applicability to Recurrent Networks - https://arxiv.org/abs/1311.0701
[46] Vu Pham, Theodore Bluche, Christopher Kermorvant, and Jerome Louradour - Dropout improves Recurrent Neural Network for Handwriting Recognition - International Conference on Frontiers in Handwriting Recognition (ICFHR 2014)
[47] Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - Recurrent Neural Network Regularization - https://arxiv.org/abs/1409.2329
[48] Sergey Ioffe, Christian Szegedy - Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift - ICML, volume 37 of JMLR Workshop and Conference Proceedings.
[49] Tim Cooijmans, Nicolas Ballas, Cesar Laurent, Ca?lar Gulcehre, and Aaron Courville - Recurrent Batch Normalization - https://arxiv.org/abs/1603.09025
[50] Leonard, Nicholas, Sagar Waghmare, Yang Wang, and Jin-Hwa Kim - rnn: Recurrent Library for Torch - https://arxiv.org/abs/1511.07889
[51] Soumith Chintala - Benchmark on ConvNet - https://github.com/soumith/convnet-benchmarks
[52] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich - Going Deeper with Convolutions - ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
[53] Karen Simonyan, and Andrew Zisserman - Very Deep Convolutional Networks for Large-Scale Visual Recognition - International Conference on Learning Representations (ICLR 2015).
[54] Justin Johnson, Alexandre Alahi, and Li Fei-Fei - Perceptual Losses for Real-Time Style Transfer and Super-Resolution - European Conference on Computer Vision (ECCV 2016).
[55] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei - ImageNet: A Large-Scale Hierarchical Image Database - International Conference on Computer Vision and Pattern Recognition (CVPR 2009).
[56] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei - Large-scale Video Classification with Convolutional Neural Networks - International Conference on Computer Vision and Pattern Recognition (CVPR 2014).
[57] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre - HMDB: A Large Video Database for Human Motion Recognition - IEEE International Conference on Computer Vision (ICCV, 2011)
[58] Khurram Soomro, Amir Roshan Zamir and Mubarak Shah - UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild. - CRCV-TR-12-01, November, 2012.
[59] Kishore K. Reddy, and Mubarak Shah - Recognizing 50 Human Action Categories of Web Videos - Machine Vision and Applications Journal (MVAP), September, 2012.
[60] Marcin Marsza, Ivan Laptev and Cordelia Schmid - Actions in Context - International Conference on Computer Vision and Pattern Recognition (CVPR 2009).
[61] Olga Russakovsky and Li Fei-Fei - Attribute learning in large-scale datasets - European Conference on Computer Vision (ECCV 2010).
[62] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li - Novel Dataset for Fine-Grained Image Categorization: Stanford Dogs - First Workshop on Fine-Grained Visual Categorization, CVPR (2011).
[63] Fei Fei Li, Rufin VanRullen, Christof Koch, and Pietro Perona - Rapid natural scene categorization in the near absence of attention – Proceedings of the National Academy of Sciences of the United States of America (vol. 99 no. 14, 2002).
[64] Li, Fei-Fei and Fergus, Rob and Perona, Pietro - Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories – Journal Computer Vision and Image Understanding (Volume 106 Issue 1, April, 2007).
[65] Karen Simonyan, and Andrew Zisserman - Two-Stream Convolutional Networks for Action Recognition in Videos - Advances in Neural Information Processing Systems (NIPS 2014).
[66] Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov - Unsupervised Learning of Video Representations using LSTMs - International Conference on Machine Learning (ICML 2015).
[67] Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell - Long-term Recurrent Convolutional Networks for Visual Recognition and Description - International Conference on Computer Vision and Pattern Recognition (CVPR 2015).
[68] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo - Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting - International Conference on Neural Information Processing Systems (NIPS 2015).
[69] Viorica Patraucean, Ankur Handa, Roberto Cipolla - Spatio-temporal video autoencoder with differentiable memory - International Conference on Learning Representations (ICLR 2016) Workshop.
指導教授 施國琛(Timothy K. Shih) 審核日期 2017-1-20
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明