Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：38

、訪客IP：13.59.241.75

姓名

江玟萱(Wen-Hsuan Chiang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

(Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning)

相關論文

★ 台灣50走勢分析：以多重長短期記憶模型架構為基礎之預測	★ 以多重遞迴歸神經網路模型為基礎之黃金價格預測分析
★ 增量學習用於工業4.0瑕疵檢測	★ 遞回歸神經網路於電腦零組件銷售價格預測之研究
★ 長短期記憶神經網路於釣魚網站預測之研究	★ 基於深度學習辨識跳頻信號之研究
★ Opinion Leader Discovery in Dynamic Social Networks	★ 深度學習模型於工業4.0之機台虛擬量測應用
★ A Novel NMF-Based Movie Recommendation with Time Decay	★ 以類別為基礎sequence-to-sequence模型之POI旅遊行程推薦
★ A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search	★ 生成式對抗網路架構搜尋
★ 以漸進式基因演算法實現神經網路架構搜尋最佳化	★ Enhanced Model Agnostic Meta Learning with Meta Gradient Memory
★ 遞迴類神經網路結合先期工業廢水指標之股價預測研究	★ A Novel Reinforcement Learning Model for Intelligent Investigation on Supply Chain Market

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來機器學習越來越受大眾歡迎，造成越來越多學者、業者、工程師等都進行相關的研究與應用。只要他們對於資料不夠理解，就有可能造成資訊的誤解或是模型的偏差，因為他們抓取的特徵就是一個機器學習的指標。為了避免手動抓取特徵的上述狀況，我們可以透過機器建立神經網路。我們的研究使用預測器來構立虛擬地圖。使用此虛擬地圖來訓練代理人，讓它可以找到良好的神經網絡體結構。但是獎勵函數有一些改變，因此我們在本研究中提出了四種模型。在實驗過程中，我們分析了四種模型的每個參數的實驗結果。並意識到模型穩定性的重要性。如果模型不穩定，則獲得的正確率的差距可能太大。然而我們的模型在正確率以及穩定性方面具有良好的性能。

摘要(英)

Abstract-- In recent years, machine learning has become more and more popular, causing more and more scholars, practitioners, and engineers to conduct related research and applications. If they don′t understand the data well, it may cause misunderstanding of the information or deviation of the model, because the feature they capture is an indicator of machine learning. In order to avoid the above situation of manually grabbing features, we can build neural networks through machines. Our research uses a predictor to build a virtual map. Using this virtual map to train agents to find the good neural network architecture. But the reward function has some changes, so we proposed four models in this research. During the experiment, we analyze the experimental results of each parameter for the four models. And realize the importance of the model stability. If the model is unstable, the gap between the obtained accuracy may be too large. However, our model has a good performance in accuracy and stability.

關鍵字(中)

★ 神經架構搜索
★ 強化學習
★ 近端策略優化
★ 神經網絡優化
★ 機器學習

關鍵字(英)

★ Neural Architecture Search
★ Reinforcement Learning
★ Proximal Policy Optimization
★ Neural Network Optimization
★ Machine Learning

論文目次

中文摘要 ii
Abstract iii
Table of contents iv
List of Figures v
List of Tables v
1. Introduction 1
2. Related Work 5
2.1 Neural Architecture Search 5
2.2 Reinforcement Learning 7
3. Methodology 9
3.1 Model Architecture 10
3.2 Data Sampling and Map Construction 11
3.3 Predictor 12
3.4 VR-PPO 13
4. Performance Evaluation 16
4.1 Accuracy Discussion 17
4.2 Parameter Setting of Predictor 18
4.3 Parameter Setting of VR-PPO 20
4.4 Reward Discussion (Virtual verse Real) 24
5. Conclusion 31
References 32

參考文獻

[1] B. Zoph, V. Vasudevan, J. Shlens, and Q. Le. “Learning transferable architectures for scalable image recognition,” CVPR, 2017.
[2] Islam, B.U., Baharudin, Z., Raza, M.Q., & Nallagownden, P. “Optimization of neural network architecture using genetic algorithm for load forecasting,” 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS), 1-6, 2014.
[3] M. A. J. Idrissi et al. “Genetic algorithm for neural network architecture optimization,” 2016 3rd International Conference on Logistics Operations Management (GaL), 2016, pp. 1-4.
[4] Ramchoun, H., Idrissi, M.A., Ghanou, Y., & Ettaouil, M. “Multilayer Perceptron: Architecture Optimization and Training,” IJIMAI, 4, 26-30, 2016.
[5] E. Real, A. Aggarwal, T. Huang, and Q. Le. “Regularized evolution for image classiﬁer architecture search,” Proceedings of the AAAI conference on artificial intelligence, Vol. 33, 2019.
[6] B. Zoph and Q. Le. “Neural architecture search with reinforcement learning,” ICLR, 2017.
[7] B. Baker, O. Gupta, N. Naik, and R. Raskar. “Designing neural network architectures using reinforcement learning,” ICLR, 2017.
[8] Bello, Irwan, Pham, Hieu, Le, Quoc V., Norouzi, Mohammad, and Bengio, Samy. “Neural combinatorial optimization with reinforcement learning,” ICLR Workshop, 2017.
[9] Guo, M., Zhong, Z., Wu, W., Lin, D., & Yan, J. “Irlas: Inverse reinforcement learning for architecture search,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.9021-9029.
[10] R. Luo, F. Tian, T. Qin, E. Chen, and T.-Y. Liu. “Neural architecture optimization,” NeurIPS, 2018.
[11] Wang, B., Xue, B., & Zhang, M.” Particle Swarm Optimization for Evolving Deep Neural Networks for Image Classification by Evolving and Stacking Transferable Blocks,” arXiv preprint arXiv: 1907.12659, 2019.
[12] Liu, H., Simonyan, K., & Yang, Y. “Darts: Differentiable architecture search,” arXiv preprint arXiv: 1806.09055, 2018.
[13] H. Cai, L. Zhu, and S. Han. “ProxylessNAS: Direct neural architecture search on target task and hardware,” ICLR, 2019.
[14] A. Noy, N. Nayman, T. Ridnik, N. Zamir, S. Doveh, I. Friedman, R. Giryes, and L. Zelnik-Manor. “ASAP: Architecture search, anneal and prune,” arXiv preprint arXiv: 1904.04123, 2019.
[15] Q. Yao, J. Xu, W.-W. Tu, and Z. Zhu. “Efficient Neural Architecture Search via Proximal Iterations,” AAAI Conference on Artificial Intelligence, 2020.
[16] Whiteson, S., & Ciosek, K. “Expected policy gradients for reinforcement learning,” Journal of Machine Learning Research, Vol. 21, (52):1-51, 2020.
[17] Booth, J. “PPO Dash: Improving Generalization in Deep Reinforcement Learning,” arXiv preprint arXiv: 1907.06704, 2019.
[18] Hämäläinen, P., Babadi, A., Ma, X., & Lehtinen, J. “PPO-CMA: Proximal policy optimization with covariance matrix adaptation,” arXiv preprint arXiv: 1810.02541, 2018.
[19] Greige, L., & Chin, P. “Reinforcement Learning in FlipIt,” arXiv preprint arXiv: 2002.12909, 2020.
[20] Van Hasselt, H., Guez, A., & Silver, D. “Deep reinforcement learning with double q-learning,” Thirtieth AAAI conference on artificial intelligence, 2016.
[21] Lakshmanan, K. “Accelerated Reinforcement Learning,” 2017 14th IEEE India Council International Conference (INDICON), IEEE, 2017, pp. 1-4.
[22] Khadka, S., & Tumer, K. “Evolutionary reinforcement learning,” arXiv preprint arXiv: 1805.07917, 2018.
[23] Jadeja, M., Varia, N., & Shah, A. “Deep Reinforcement Learning for Conversational AI,” arXiv preprint arXiv: 1709.05067, 2017.
[24] Khandel, P., Rassafi, A. H., Pourahmadi, V., Sharifian, S., & Zheng, R. “SensorDrop: A Reinforcement Learning Framework for Communication Overhead Reduction on the Edge,” arXiv preprint arXiv: 1910.01601, 2019.
[25] Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., & Dasgupta, S. “Model-based deep reinforcement learning for dynamic portfolio optimization,” arXiv preprint arXiv: 1901.08740, 2019.
[26] Dunjko, V., Taylor, J. M., & Briegel, H. J. “Advances in quantum reinforcement learning,” 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 282-287, IEEE, 2017.
[27] Faust, A., Francis, A., & Mehta, D. “Evolving rewards to automate reinforcement learning,” arXiv preprint arXiv: 1905.07628, 2019.
[28] Yingjun, P., & Xinwen, H. “Learning Representations in Reinforcement Learning: An Information Bottleneck Approach,” arXiv preprint arXiv: 1911.05695, 2019.
[29] Levy, A., Platt, R., & Saenko, K. “Hierarchical reinforcement learning with hindsight,” arXiv preprint arXiv: 1805.08180, 2018.
[30] Haj-Ali, A., Ahmed, N. K., Willke, T., Gonzalez, J., Asanovic, K., & Stoica, I. “Deep Reinforcement Learning in System Optimization,” arXiv preprint arXiv: 1908.01275, 2019.
[31] Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., & Dean, J. “Efficient neural architecture search via parameter sharing.” CoRR abs/1802.03268, 2018.
[32] Liu, C., Zoph, B., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., & Murphy, K. “Progressive neural architecture search.” ECCV, 2018.

指導教授

陳以錚(Yi-Cheng Chen)

審核日期

2020-8-20

推文