A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：226

、訪客IP：18.191.186.139

姓名

蔡子涵(Tzu-Han Tsai) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

(A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search)

相關論文

★ 台灣50走勢分析：以多重長短期記憶模型架構為基礎之預測	★ 以多重遞迴歸神經網路模型為基礎之黃金價格預測分析
★ 增量學習用於工業4.0瑕疵檢測	★ 遞回歸神經網路於電腦零組件銷售價格預測之研究
★ 長短期記憶神經網路於釣魚網站預測之研究	★ 基於深度學習辨識跳頻信號之研究
★ Opinion Leader Discovery in Dynamic Social Networks	★ 深度學習模型於工業4.0之機台虛擬量測應用
★ A Novel NMF-Based Movie Recommendation with Time Decay	★ 以類別為基礎sequence-to-sequence模型之POI旅遊行程推薦
★ Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning	★ 生成式對抗網路架構搜尋
★ 以漸進式基因演算法實現神經網路架構搜尋最佳化	★ Enhanced Model Agnostic Meta Learning with Meta Gradient Memory
★ 遞迴類神經網路結合先期工業廢水指標之股價預測研究	★ A Novel Reinforcement Learning Model for Intelligent Investigation on Supply Chain Market

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

機器學習算法是一類從數據中自動分析獲得規律，並利用規律對未知數據進行預測的算法。機器學習已廣泛應用於數據挖掘、計算機視覺、自然語言處理、生物特徵識別、搜尋引擎、醫學診斷、檢測信用卡欺詐、證券市場分析等等，網路時代來臨時帶動了數據量的的成長，但當在設計神經網路時針對某一項資料集設計一個神經網絡架構需要專業的知識、時間以及電算資源。每一個神經網路都是通過專家許多專業知識還有一次又一次的仔細的實驗或是從少數現有的優秀神經網絡更改其架構而來。為了加速建構神經網路的建構，我們建構了一套系統HILL-CLIMBING MODEL；這是一種基於強化學習的建模算法，可以給定強化學習中學習任務自動生成表現優異的神經網路架構。使用強化學習的訓練並搭配使用Epsilon貪婪的探索策略和經驗回放的DQN讓強化學習經由這些經驗與策略生成表現優異的神經網路。強化學習搭配貪婪式的探索加強了架構的可能性，並經由迭代地發現具有改進的學習任務的設計。即使在圖像分類基準測試中，強化學習的網絡也可以像設計的現有網絡那樣做得一樣好，而且效率更高。

摘要(英)

Designing neural network (NN) architectures requires both human expertise and labor. New architectures are handcrafted by careful experimentation or modified from a handful of existing networks. We introduce HCM, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing NN architectures for a given learning task. The learning agent is trained to sequentially choose NN layers using DQN with an ɛ-greedy exploration strategy and experience replay. The agent explores a large but finite space of possible architectures and iteratively discovers designs with improved performance on the learning task. Even on image classification benchmarks, the agent-designed networks can do good as existing networks designed but more efficient. We also outperform existing meta-modeling approaches for network design on image classification or regression tasks.

關鍵字(中)

★ 機器學習
★ 神經網路
★ 強化學習
★ 神經網路架構

關鍵字(英)

★ Machine learning
★ Neural network
★ Reinforcement learning
★ Neural network architecture

論文目次

中文摘要 ii
Abstract iii
Table of contents iv
1. Introduction 1
2. Related Work 6
2.1 Designing neural network architectures 6
2.2 Reinforcement Learning 7
3. Proposed Methodology 11
3.1 Main Proposed 11
3.2 Date Sample and Map Construction 12
3.3 Trajectory Collection and Mountaineer Learning 15
3.4 Applied Mountaineer on Completed Data 18
3.5 Applied Mountaineer on Complete Date & Structure Tuning 20
4. Experiment Detail 23
5. Experiment Result 25
5.1 Performance Analysis 25
5.2 Gamma Test Analysis 26
5.3 Epsilon-Greedy Analysis 26
5.4 Mountaineer Start Analysis 28
5.5 Data Sampling Analysis 30
5.6 Foot Distance Analysis 32
6. Conclusion 34
Reference 36

參考文獻

[1] Baker, Bowen, et al. "Designing neural network architectures using reinforcement learning." arXiv preprint arXiv:1611.02167(2016).
[2] Zhong, Zhao, Junjie Yan, and Cheng-Lin Liu. "Practical network blocks design with q-learning." arXiv preprint arXiv:1708.055521.2 (2017): 5.
[3] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
[4] Schaffer, J. David, Darrell Whitley, and Larry J. Eshelman. "Combinations of genetic algorithms and neural networks: A survey of the state of the art." [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks. IEEE, 1992.
[5] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." Advances in neural information processing systems. 2012.
[6] Swersky, Kevin, Jasper Snoek, and Ryan P. Adams. "Multi-task bayesian optimization." Advances in neural information processing systems. 2013.
[7] Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.
[8] Cai, Han, et al. "Efficient architecture search by network transformation." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[9] Stanley, Kenneth O., and Risto Miikkulainen. "Evolving neural networks through augmenting topologies." Evolutionary computation 10.2 (2002): 99-127.
[10] Verbancsics, Phillip, and Josh Harguess. "Generative neuroevolution for deep learning." arXiv preprint arXiv:1312.5355(2013).
[11] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.
[12] Bergstra, James, Daniel Yamins, and David Daniel Cox. "Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures." (2013).
[13] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.
[14] Lin, Long-Ji. Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE, 1993.
[15] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of bayesian optimization." Proceedings of the IEEE 104.1 (2016): 148-175.
[16] Domhan, Tobias, Jost Tobias Springenberg, and Frank Hutter. "Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves." Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.
[17] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." Advances in neural information processing systems. 2012.
[18] Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task bayesian optimization. NIPS, pp. 2004–2012, 2013.
[19] Bergstra, James S., et al. "Algorithms for hyper-parameter optimization." Advances in neural information processing systems. 2011.
[20] Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial intelligence research 4 (1996): 237-285.
[21] Vilalta, Ricardo, and Youssef Drissi. "A perspective view and survey of meta-learning." Artificial intelligence review 18.2 (2002): 77-95.
[22] Hochreiter, Sepp, A. Steven Younger, and Peter R. Conwell. "Learning to learn using gradient descent." International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, 2001.
[23] Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." Advances in Neural Information Processing Systems. 2016.
[24] Vermorel, Joannes, and Mehryar Mohri. "Multi-armed bandit algorithms and empirical evaluation." European conference on machine learning. Springer, Berlin, Heidelberg, 2005.
[25] Tsitsiklis, John N. "Asynchronous stochastic approximation and Q-learning." Machine learning 16.3 (1994): 185-202.
[26] Bertsekas, Dimitri. "Distributed dynamic programming." IEEE transactions on Automatic Control 27.3 (1982): 610-616.
[27] Tomassini, Marco. "Parallel and distributed evolutionary algorithms: A review." (1999).
[28] Koutník, Jan, Jürgen Schmidhuber, and Faustino Gomez. "Evolving deep unsupervised convolutional networks for vision-based reinforcement learning." Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 2014.
[29] Galstyan, Aram, Karl Czajkowski, and Kristina Lerman. "Resource allocation in the grid using reinforcement learning." Proceedings of the Third International Joint Conference on Autonomous Mountaineers and Multimountaineer Systems-Volume 3. IEEE Computer Society, 2004.
[30] Gomes, Eduardo Rodrigues, and Ryszard Kowalczyk. "Learning the IPA market with individual and social rewards." Web Intelligence and Mountaineer Systems: An International Journal 7.2 (2009): 123-138.
[31] Ziogos, N. P., et al. "A reinforcement learning algorithm for market participants in FTR auctions." 2007 IEEE Lausanne Power Tech. IEEE, 2007.
[32] Bertsekas, Dimitri P., and Athena Scientific. Convex optimization algorithms. Belmont: Athena Scientific, 2015.
[33] Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. Diss. King′s College, Cambridge, 1989.
[34] Dean, Jeffrey, et al. ”Large scale distributed deep networks.” Advances in neural information processing systems. 2012.
[35] Gu, Shixiang, et al. ”Continuous deep q-learning with model-based acceleration.” arXiv preprint arXiv:1603.00748 (2016).
[36] Van Hasselt, Hado, Arthur Guez, and David Silver. ”Deep Reinforcement Learning with Double Q-Learning.” AAAI. 2016.
[37] Narendra, Kumpati S., Yu Wang, and Snehasis Mukhopadyhay. ”Fast Reinforcement Learning using Multiple Models.”, 2016 Control and Decision Conference, Las Vegas
[38] Narendra, Kumpati S., Snehasis Mukhopadyhay, and Yu Wang. ”Improving the Speed of Response of Learning Algorithms Using Multiple Models: An Introduction.”, the 17th Yale Workshop on Adaptive and Learning Systems
[39] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” arXiv:1504.00702 [cs.LG], 2015.
[40] J.-A. M. Assael, N. Wahlström, T. B. Schön, and M. P. Deisenroth, “Data-efficient learning of feedback policies from image pixels using deep dynamical models,” arXiv:1510.02173 [cs.AI], 2015.
[41] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” arXiv:1412.7755 [cs.LG], 2014.
[42] Zoph, B., & Le, Q. V. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.
[43] Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018, April). Efficient architecture search by network transformation. In Thirty-Second AAAI Conference on Artificial Intelligence.
[44] Liu, H., Simonyan, K., Vinyals, O., Fernando, C., & Kavukcuoglu, K. (2017). Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436.
[45] Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. arXiv preprint arXiv:1302.4389.
[46] Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.
[47] Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.
[48] Liu, X. Y., Wu, J., & Zhou, Z. H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550.

指導教授

陳以錚(Yi-Cheng Chen)

審核日期

2019-8-21

推文