基於策略式強化學習之神經網路架構搜最佳化

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：33

、訪客IP：3.145.174.253

姓名

蕭仲廷(Chung-Ting Hsiao) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

基於策略式強化學習之神經網路架構搜最佳化
(A Policy-Based Reinforcement Learning Model for Neural Network Architecture Search)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 針對JPEG影像中隙縫修改之偵測技術
★ 基於保護影像中直線結構的細縫裁減系統	★ 建基於開放式網路社群學習環境之社群推薦機制
★ 英語作為外語的互動式情境學習環境之系統設計	★ 基於膚色保存之情感色彩轉換機制
★ A Gesture-based Presentation System for Smart Classroom using Kinect	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來深度學習以及他的核心方法神經網路在機器學習領域中佔有很重要的份量。原因在於擁有很高的準確率以及很簡單就可以實作的強項，同時也得力於運算資源的高速進步，GPU 和 TPU 來的加速優勢使得深度學習這個領域也有著蓬勃的發展。在實作深度學習的時候最常遇到的問題就是如何設計神經網路的架構以及決定該用甚麼樣的激勵函數這些超參數該如何設定。在手動設計神經網路的時候我們需要大量的經驗以及該領域的知識才能設計出一個相對而言表現較好的神經網路，因此在許多研究中希望依靠電腦本身以及各樣的演算法使得神經網路的架構能夠自動的生成或是自動的最佳化超參數，實作自動化的設計神經網路有各式各樣的方法，在這篇論文中我們提出爬山模型，使用強化學習搭配我們所設計特殊環境，讓強化學習中的 Agent 能夠有效地找到最佳化的神經網路，並同時只需要少量的運算資源就能夠達成我們的任務。

摘要(英)

The last few years, in the field of machine learning deep learning and its core – neural network has playing a very important role. Because of its high performance, and easy to implement, in addition to computing resource advancement such as GPU and TPU, these reason triggered a vigorous growth of deep learning and neural network. The most common problem user faced is when using deep learning which neural network structure should user use, and how can neural network structure be designed. Furthermore because of handcraft neural network need a great deal of knowledge and experience. Lately, a lot of studies has focused on this issue—how to automatically generate the finest neural network. There are several methods to implement the method, in this paper we use reinforcement learning based method, and Propose a special method call Hill Climbing Model (HCM), this model will find the finest structure for the user, and it is easy to train just cost few computing resource.

關鍵字(中)

★ 深度學習
★ 神經網路
★ 強化學習
★ 機器學習
★ 神經網路架構搜尋

關鍵字(英)

★ Deep Learning
★ Neural Network
★ Reinforcement Learning
★ Machine Learning
★ Neural Network Architecture Search

論文目次

Chinese Abstract .......... i
English Abstract .......... ii
Acknowledgement .......... iii
index .......... iv
List of Figure .......... v
List of Tables .......... vi
List of symbol .......... vii
1. Introduction .......... 1
2. Related Work .......... 6
2.1 Evolution Algorithm Based .......... 6
2.2 Reinforcement Learning Based .......... 7
2.3 Other Methods .......... 8
3. Proposed Method: HCM .......... 9
3.1 Data Sampling .......... 10
3.2 Map Construction .......... 11
3.3 Trajectory Collection and Agent Learning .......... 13
3.4 Applied Agent on Complete Data .......... 16
3.5 Structure Tuning .......... 17
4. Experiments .......... 18
4.1 Experiment Setup .......... 18
4.2 Benchmark Comparison .......... 20
4.3 Agent Start Mode Analysis .......... 21
4.4 Data Sampling Analysis .......... 23
4.5 Step Distance Analysis .......... 25
4.6 Cross Domain Dataset Analysis .......... 26
5. Conclusion .......... 29
References .......... 30

參考文獻

[1] Zoph, Barret; LE, Quoc V. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
[2] Baker, Bowen, et al. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016.
[3] Miikkulainen, Risto, et al. Evolving deep neural networks. In: Artificial Intelligence in the Age of Neural Networks and Brain Computing. Academic Press p. 293-312., 2019.
[4] Saxena, Shreyas; Verbeek, Jakob. Convolutional neural fab-rics. In: Advances in Neural Information Processing Systems. p. 4053-4061, 2016.
[5] Schaffer, J. David; Whitley, Darrell; Eshelman, Larry J. Com-binations of genetic algorithms and neural networks: A sur-vey of the state of the art. In: [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algo-rithms and Neural Networks. IEEE p. 1-37, 1992.
[6] Stanley, Kenneth O.; Miikkulainen, Risto. Evolving neural networks through augmenting topologies. Evolutionary computation10.2: 99-127, 2002.
[7] Verbancsics, Phillip; Harguess, Josh. Generative neuroevolu-tion for deep learning. arXiv preprint arXiv:1312.5355, 2013.
[8] Liu, Hanxiao, et al. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436, 2017.
[9] Tomassini, Marco. Parallel and distributed evolutionary algorithms: A review. 1999.
[10] Koutník, Jan; Schmidhuber, Jürgen; Gomez, Faustino. Evolv-ing deep unsupervised convolutional networks for vision-based reinforcement learning. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computa-tion. ACM p. 541-548, 2014.
[11] Tsitsiklis, John N. Asynchronous stochastic approximation and Q-learning. Machine learning, 16.3: 185-202, 1994.
[12] Bertsekas, Dimitri. Distributed dynamic programming. IEEE transactions on Automatic Control, 27.3: 610-616, 1982.
[13] Cai, Han, et al. Efficient architecture search by network transformation. In: Thirty-Second AAAI Conference on Arti-ficial Intelligence. 2018.
[14] Bello, Irwan, et al. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
[15] Pham, Hieu, et al. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018.
[16] Shahriari, Bobak, et al. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104.1: 148-175, 2015.
[17] Bergstra, James; Yamins, Daniel; Cox, David Daniel. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. 2013.
[18] Domhan, Tobias; Springenberg, Jost Tobias; Hutter, Frank. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial In-telligence. 2015.
[19] Snoek, Jasper; Larochelle, Hugo; Adams, Ryan P. Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, p. 2951-2959. 2012.
[20] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian optimization. NIPS, pp. 2004–2012, 2013.
[21] Bergstra, James S., et al. Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems. p. 2546-2554. 2011.
[22] Vilalta, Ricardo; Drissi, Youssef. A perspective view and survey of meta-learning. Artificial intelligence review, 18.2: 77-95, 2002.
[23] Hochreiter, Sepp; Younger, A. Steven; Conwell, Peter R. Learning to learn using gradient descent. In: International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, p. 87-94. 2001.
[24] Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." Advances in Neural Infor-mation Processing Systems. 2016.
[25] LIU, Xu-Ying; WU, Jianxin; ZHOU, Zhi-Hua. Exploratory undersampling for class-imbalance learning. IEEE Transac-tions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39.2: 539-550, 2008.
[26] Schapire, Robert E. A brief introduction to boosting. In: Ijcai. p. 1401-1406. 1999.
[27] Özdemir, Ahmet Turan, and Billur Barshan. “Detecting Falls with Wearable Sensors Using Machine Learning Techniques.” Sensors (Basel, Switzerland) 14.6 (2014): 10691–10708. PMC. Web. 23 Apr. 2017.
[28] Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C Courville, and Yoshua Bengio. Maxout networks. ICML (3), 28:1319–1327, 2013.
[29] Min Lin, Qiang Chen, and Shuicheng Yan. Network in net-work. arXiv preprint arXiv:1312.4400, 2013.
[30] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
[31] Zhong, Zhao, Junjie Yan, and Cheng-Lin Liu. "Practical net-work blocks design with q-learning." arXiv preprint arXiv:1708.055521.2 (2017): 5.
[32] Swersky, Kevin, Jasper Snoek, and Ryan P. Adams. "Multi-task bayesian optimization." Advances in neural information processing systems. 2013.
[33] Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.
[34] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.
[35] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
[36] Lin, Long-Ji. Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCI-ENCE, 1993.
[37] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artifi-cial intelligence research 4 (1996): 237-285.
[38] Vermorel, Joannes, and Mehryar Mohri. "Multi-armed bandit algorithms and empirical evaluation." European conference on machine learning. Springer, Berlin, Heidelberg, 2005.
[39] Galstyan, Aram, Karl Czajkowski, and Kristina Lerman. "Resource allocation in the grid using reinforcement learning." Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 3. IEEE Computer Society, 2004.

[40] Gomes, Eduardo Rodrigues, and Ryszard Kowalczyk. "Learning the IPA market with individual and social re-wards." Web Intelligence and Agent Systems: An Interna-tional Journal 7.2 (2009): 123-138.
[41] Ziogos, N. P., et al. "A reinforcement learning algorithm for market participants in FTR auctions." 2007 IEEE Lausanne Power Tech. IEEE, 2007.
[42] Bertsekas, Dimitri P., and Athena Scientific. Convex optimi-zation algorithms. Belmont: Athena Scientific, 2015.
[43] Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. Diss. King′s College, Cambridge, 1989.
[44] Dean, Jeffrey, et al. ”Large scale distributed deep networks.” Advances in neural information processing systems. 2012.
[45] Gu, Shixiang, et al. ”Continuous deep q-learning with model-based acceleration.” arXiv preprint arXiv:1603.00748 (2016).
[46] Van Hasselt, Hado, Arthur Guez, and David Silver. ”Deep Reinforcement Learning with Double Q-Learning.” AAAI. 2016.
[47] Narendra, Kumpati S., Yu Wang, and Snehasis Mukhopady-hay. ”Fast Reinforcement Learning using Multiple Models.”, 2016 Control and Decision Conference, Las Vegas
[48] Narendra, Kumpati S., Snehasis Mukhopadyhay, and Yu Wang. ”Improving the Speed of Response of Learning Algo-rithms Using Multiple Models: An Introduction.”, the 17th Yale Workshop on Adaptive and Learning Systems
[49] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” arXiv:1504.00702 [cs.LG], 2015.
[50] J.-A. M. Assael, N. Wahlström, T. B. Schön, and M. P. Deisenroth, “Data-efficient learning of feedback policies from image pixels using deep dynamical models,” arXiv:1510.02173 [cs.AI], 2015.
[51] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recog-nition with visual attention,” arXiv:1412.7755 [cs.LG], 2014.

指導教授

施國琛(Timothy K. Shih)

審核日期

2019-7-30

推文