Exploring Effects of Optimizer Selection and Their Hyperparameter Tuning on Performance of Deep Neural Networks for Image Recognition

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：112

、訪客IP：18.217.90.215

姓名

陳靖玟(Jing-Wun Chen) 查詢紙本館藏

畢業系所

數學系

論文名稱

Exploring Effects of Optimizer Selection and Their Hyperparameter Tuning on Performance of Deep Neural Networks for Image Recognition
(Exploring Effects of Optimizer Selection and Their Hyperparameter Tuning on Performance of Deep Neural Networks for Image Recognition)

相關論文

★ 非線性塊狀高斯消去牛頓演算法在噴嘴流體的應用	★ 以平行 Newton-Krylov-Schwarz 演算法解 Poisson-Boltzmann 方程式的有限元素解在膠體科學上的應用
★ 最小平方有限元素法求解對流擴散方程以及使用Bubble函數的改良	★ Bifurcation Analysis of Incompressible Sudden Expansion Flows Using Parallel Computing
★ Parallel Jacobi-Davidson Algorithms and Software Developments for Polynomial Eigenvalue Problems in Quantum Dot Simulation	★ An Inexact Newton Method for Drift-DiffusionModel in Semiconductor Device Simulations
★ Numerical Simulation of Three-dimensional Blood Flows in Arteries Using Domain Decomposition Based Scientific Software Packages in Parallel Computers	★ A Parallel Fully Coupled Implicit Domain Decomposition Method for the Stabilized Finite Element Solution of Three-dimensional Unsteady Incompressible Navier-Stokes Equations
★ A Study for Linear Stability Analysis of Incompressible Flows on Parallel Computers	★ Parallel Computation of Acoustic Eigenvalue Problems Using a Polynomial Jacobi-Davidson Method
★ Numerical Study of Algebraic Multigrid Methods for Solving Linear/Nonlinear Elliptic Problems on Sequential and Parallel Computers	★ A Parallel Multilevel Semi-implicit Scheme of Fluid Modeling for Numerical Low-Temperature Plasma Simulation
★ Performance Comparison of Two PETSc-based Eigensolvers for Quadratic PDE Problems	★ A Parallel Two-level Polynomial Jacobi-Davidson Algorithm for Large Sparse Dissipative Acoustic Eigenvalue Problems
★ A Full Space Lagrange-Newton-Krylov Algorithm for Minimum Time Trajectory Optimization	★ Parallel Two-level Patient-specific Numerical Simulation of Three-dimensional Rheological Blood Flows in Branching Arteries

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，深度學習蓬勃發展，人們開始使用深度學習解決問題。深度神經網路可以用來進行語音辨識、圖像辨識、物件偵測、人臉辨識或是無人駕駛等等。最基礎的神經網路是多層感知器(MLP)，由多個節點層組成，每層間互相全連接，而多層感知器最大的問題在於會忽略資料的形狀或順序，例如輸入影像資料時，將資訊處理成一維便會失去圖像重要的空間資訊，所以發展了卷積神經網路(CNN)。卷積神經網路比傳統的神經網路多了卷積層(Convolution layer)以及池化層(Pooling layer)，以此保存以及擷取圖像特徵。

我們將資料放進神經網路後，希望神經網路輸出的結果能夠接近真實值，其中就需要優化方法(Optimizer)的幫忙，讓預測值和真值的誤差最小化。深度學習內的優化方法通常都是基於梯度下降法(Gradient descent)改進的，而該如何選擇合適學習率(Learning rate)是一個難題。所以在本次實驗中，我們使用三種資料集(手寫數字MNIST, CIFAR-10和火車路途場景)及兩種網路架構(多層感知器以及卷積神經網路)，搭配六種優化器(Gradient descent, Momentum, Adaptive gradient algorithm, Adadelta, Root Mean Square Propagation, 和Adam)。想藉此探討在圖像辨識問題上，優化器的選擇以及超參數的選取會有怎樣的影響。

摘要(英)

In recent years, deep learning has flourished and people have begun to use deep learning to solve problems. Deep neural networks can be used for speech recognition, image recognition, object detection, face recognition, or driverless. The most basic neural network is the Multilayer Perceptron (MLP), which consists of multiple node layers, each layer is fully connected to each other, and one of the drawbacks of MLP is that it ignores the shape of the data which is important for image data. Compare to traditional neural networks, the convolutional neural network (CNN) has additional convolution and pooling layers which are used for preserving and capturing image features.

The accuracy rate for prediction using neural network depends on many factors, such as the architecture of neural networks, the cost functions, and the selection of an optimizer. The goal of this work is to investigate the effects of optimizer selection and their hyperparameter tuning on the performance of deep neural networks for image recognition problems. We use three data sets including MNIST, CIFAR-10 and train route scenarios as test problems and test six optimizers (Gradient descent, Momentum, Adaptive gradient algorithm, Adadelta, Root Mean Square Propagation, and Adam). Our numerical results show that Adam is a good choice because of its efficiency and robustness.

關鍵字(中)

★ 深度學習

關鍵字(英)

論文目次

Contents
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Artifi cial neural network (ANN) . . . . . . . . . . . . . . . . . . . . . . .. 2
2.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.2 Multilayer feedforward neural network . . . . . . . . . . . . . . . . 3
2.1.3 Activation function . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Deep neural network (DNN) . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Multilayer perceptron (MLP) . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Backpropagation [5] . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Convolutional neural network (CNN) . . . . . . . . . . . . . . . . . 8
2.2.4 Hyperparameter in neural network . . . . . . . . . . . . . . . . . . 9
2.3 Loss function in neural network . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Mean Square error (MSE) . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Softmax loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 The optimizer in neural network . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Gradient descent (GD) . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.3 Adaptive gradient algorithm (Adagrad) . . . . . . . . . . . . . . . . 11
2.4.4 Adadelta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.5 Root mean square propagation (RMSprop) . . . . . . . . . . . . . 13
2.4.6 Adaptive moment estimation (Adam) . . . . . . . . . . . . . . . . . 13
3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 MNIST [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 CIFAR-10 [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.3 The images from Norway’s railway . . . . . . . . . . . . . . . . . . 16
3.1.4 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Network architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Experimental process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 The overview of parameters . . . . . . . . . . . . . . . . . . . . . . 19
4 Numerical results ans discussions . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 Comparison of models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Comparison of optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.1 Exploring the sensitivity of selection of learning rate . . . . . . . . 22
4.2.2 Exploring the convergence speed . . . . . . . . . . . . . . . . . . . 27
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

參考文獻

[1] Matt W Gardner and SR Dorling. Artifi cial neural networks (the multilayer percep-
tron)ȋa review of applications in the atmospheric sciences. Atmospheric environ-
ment, 32:2627–2636, 1998.
[2] Steve Lawrence, C Lee Giles, Ah Chung Tsoi, and Andrew D Back. Face recognition:
A convolutional neural-network approach. IEEE Transactions on Neural Networks,
8:98–113, 1997.
[3] Yoon Kim. Convolutional neural networks for sentence classifi cation. arXiv preprint
arXiv:1408.5882, 2014.
[4] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Suk-
thankar, and Li Fei-Fei. Large-scale video classifi cation with convolutional neural
networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1725–1732, 2014.
[5] Robert Hecht-Nielsen. Theory of the backpropagation neural network. In Neural
Networks for Perception, pages 65–93. Elsevier, 1992.
[6] Li Deng. The MNIST database of handwritten digit images for machine learning
research [best of the web]. IEEE Signal Processing Magazine, 29:141–142, 2012.
[7] Alex Krizhevsky and Geoff Hinton. Convolutional deep belief networks on cifar-10.
Unpublished manuscript, 40, 2010.
[8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haff ner. Gradient-based
learning applied to document recognition. Proceedings of the IEEE, 86:2278–2324,
1998.
[9] Alex Krizhevsky, Ilya Sutskever, and Geoff rey E Hinton. Imagenet classifi cation with
deep convolutional neural networks. In Advances in Neural Information Processing
Systems, pages 1097–1105, 2012.
[10] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[11] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going
deeper with convolutions. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 1–9, 2015.
[12] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hier-
archies for accurate object detection and semantic segmentation. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587,
2014.
[13] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look
once: Unifi ed, real-time object detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 779–788, 2016.
[14] Florian Schroff , Dmitry Kalenichenko, and James Philbin. Facenet: A unifi ed em-
bedding for face recognition and clustering. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 815–823, 2015.
[15] Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for
face verifi cation. IEEE Signal Processing Letters, 25:926–930, 2018.
[16] Sameep Tandon Jeff Kiske Will Song Joel Pazhayampallil Mykhaylo Andriluka
Pranav Rajpurkar Toki Migimatsu Royce Cheng-Yue Fernando Mujica Adam Coates
Andrew Y. Ng Brody Huval, Tao Wang. An empirical evaluation of deep learning
on highway driving. arXiv preprint arXiv:1504.01716, 2015.
[17] James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms
for hyper-parameter optimization. In Advances in Neural Information Processing
Systems, pages 2546–2554, 2011.
[18] Thomas M Breuel. The eff ects of hyperparameters on SGD training of neural net-
works. arXiv preprint arXiv:1508.02788, 2015.
[19] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT press,
2016.
[20] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for
online learning and stochastic optimization. Journal of Machine Learning Research,
pages 2121–2159, 2011.
[21] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[22] Anil K Jain, Jianchang Mao, and KM Mohiuddin. Artifi cial neural networks:A
tutorial. Computer, pages 31–44, 1996.
[23] Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised
learning algorithms. In Proceedings of the 23rd international conference on Machine
learning, pages 161–168. ACM, 2006.
[24] Horace B Barlow. Unsupervised learning. Neural Computation, 1:295–311, 1989.
[25] Mario A. T. Figueiredo and Anil K. Jain. Unsupervised learning of fi nite mixture
models. IEEE Transactions on Pattern Analysis & Machine Intelligence, pages 381–
396, 2002.
[26] Daniel Svozil, Vladimir Kvasnicka, and Jiri Pospichal. Introduction to multi-layer
feed-forward neural networks. Chemometrics and Intelligent Laboratory Systems,
39:43–62, 1997.
[27] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward
networks are universal approximators. Neural Networks, 2:359–366, 1989.
[28] Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer
feedforward networks with a nonpolynomial activation function can approximate any
function. Neural Networks, 6:861–867, 1993.
[29] Alex Krizhevsky, Ilya Sutskever, and Geoff rey E Hinton. Imagenet classifi cation with
deep convolutional neural networks. In Advances in Neural Information Processing
Systems, pages 1097–1105, 2012.
[30] Geoff rey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep
Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al.
Deep neural networks for acoustic modeling in speech recognition. IEEE Signal
Processing Magazine, 29, 2012.

指導教授

黃楓南(Feng-Nan Hwang)

審核日期

2019-5-9

推文