RS-NAS:基於策略與價值含獎勵重構之 強化學習於網路結構搜索

、線上人數：133

、訪客IP：18.119.143.253

姓名	張慕平(ZHANG,MU-PING) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	RS-NAS:基於策略與價值含獎勵重構之強化學習於網路結構搜索 (RS-NAS: A Policy and Value-Based Reinforcement Learning with Reward Shaping on Neural Architecture Search)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 (2028-7-13以後開放)
摘要(中)	十年來，隨著硬體效能的增加，深度學習成為了熱門的研究對象，其中在電腦視覺中，卷積神經網路（Convolutional Neural Networks , CNNs）是廣為人知的技術。研究過程中，人們發現複雜的網路模型往往能獲得更高的準確率，但在一些資源有限的終端設備上，複雜模型所帶來的龐大資源消耗，大幅度地限制了CNN的使用。因此近幾年，許多研究都專注於網路結構搜索（Neural Architecture Search, NAS）領域：根據不同目標來自動設計網路模型的技術，而在NAS領域中，我們根據優化方法的不同，將其分成三個類別：強化學習（Reinforcement Learning, RL）、進化演算法（Evolutionary Algorithms, EA）、可微分優化（Differentiable Optimization）。本篇論文針對強化學習方法的 NAS任務上提出一種新的獎勵重構（Reward Shaping）機制，我們稱為RS-NAS，目的是解決強化學習在NAS搜索過程中，會遭遇的稀疏獎勵挑戰，強化學習中的代理人（Agent）無法在搜索過程中獲得獎勵，只能根據最後一步搜索出的模型架構來取得獎勵，這樣使代理人無法評估搜索過程中的每一步優劣，從而降低整體的搜索效能。我們使用兩種強化學習演算法來實作RS-NAS，一種是基於策略（Policy-Based）的近端策略優化（Proximal Policy Optimization, PPO）；另一種是基於價值（Value-Based）的深度Q網路（Deep Q Network, DQN）。同時為了降低搜索成本與變因，讓不同方法盡量在同一標準上比較，本篇論文中我們使用NATS。當作我們的搜索空間，相較於NATS原本的強化學習方法，RS-NAS有更好的搜索性能與穩定性。
摘要(英)	Over the past decade, deep learning emerges as a popular research domain with the upgrading of hardware performance. Recently, Convolutional Neural Networks (CNNs) have been admitted as a significant success in computer vision. Moreover, researchers observe that complex network models can often achieve higher accuracy. However, complex models greatly limit the use of CNNs on resource-constrained devices. As a result, many researchers focus their attention on Neural Architecture Search (NAS) recently, which aims at automatically designing network models based on different objectives. Among them, Reinforcement Learning (RL) is a commonly utilized optimization method in NAS. In this thesis. we propose a novel reward shaping mechanism called RS-NAS for designing the RL-based NAS task. The objective is to address the challenge of sparse rewards encountered during the search process in RL. In traditional RL, agents cannot obtain rewards during the search process and can only receive rewards based on the final model architecture obtained. It prevents agents from evaluating the quality of each step in the search process, and hence reduces overall search efficiency. The proposed RS-NAS is implemented using two RL algorithms: Proximal Policy Optimization (PPO) which is a policy-based method, and Deep Q Network (DQN) which is a value-based method. In this thesis, we utilize NATS as the search space to reduce search costs and alleviate the factors undergoing on different methods for fair comparison. Comparing with the original RL methods in NATS, experimental results verify that the proposed RS-NAS demonstrates better search performance and stability.
關鍵字(中)	★ 強化學習 ★ 稀疏獎勵 ★ 網路結構搜索	關鍵字(英)	★ Reinforcement Learning ★ Sparse Reward ★ Neural Architecture Search
論文目次	摘要 I Abstract II 目錄 III 圖目錄 V 表目錄 VII 第一章緒論 1 1-1 研究動機與目的 1 1-2 論文架構 4 第二章相關文獻 5 2-1 強化學習（RL） 5 2-1-1 基於策略（Policy-Based） 6 2-1-2 基於價值（Value-Based） 7 2-1-3 執行-評判（Actor-Critic） 7 2-2 神經網路結構搜索（NAS） 8 2-2-1 調查報告 8 2-2-2 NAS-RL 9 2-2-3 NASNet 11 2-2-4 MetaQNN 13 2-2-5 基準點 14 2-2-6 其他 15 第三章研究內容與架構 16 3-1 RS-NAS 16 3-2 RS-NAS 基於策略（Policy-Based）方法 18 3-2-1 近端策略優化（Proximal Policy Optimization, PPO） 18 3-2-2 架構圖與訓練細節 20 3-3 RS-NAS 基於價值（Value-Based）方法 21 3-3-1 深度Q網路（Deep Q Network, DQN） 21 3-3-2 深度雙Q網路（Double Deep Q Network） 23 3-3-3 深度競爭Q網路（Dueling Deep Q Network） 24 3-3-4 架構圖與演算法細節 26 第四章實驗結果與討論 27 4-1 軟硬體設備與研究環境 27 4-2 資料集介紹 28 4-2-1 NATS 28 4-2-2 圖像資料集 29 4-3 實驗說明 30 4-4 進階實驗 32 4-5 超參數實驗 34 4-5-1 RS-NAS PPO 34 4-5-2 RS-NAS DQN 36 第五章總結 37 5-1 結論與貢獻 37 5-2 未來展望 38 參考文獻 39
參考文獻	［1］ Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86 (11), pp. 2278-2324, 1998. ［2］ A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS) - Volume 1, pp. 1097-1105, 2012. ［3］ J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” ArXiv, abs/1707.06347, 2017. ［4］ V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M.A. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” ArXiv, abs/1312.5602, 2013. ［5］ X. Dong, L. Liu, K. Musial, and B. Gabrys, “NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (7), pp. 3634-3646, 2022. ［6］ J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” In Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009. ［7］ K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” In Proceeding of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016. ［8］ K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” In Proceeding of 3rd International Conference on Learning Representations (ICLR), pp. 1–14, 2015. ［9］ A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” ArXiv, abs/1704.04861, 2017. ［10］ M. Sandler, A.G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” In Proceeding of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510-4520, 2018. ［11］ A.G. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q.V. Le, and H. Adam, “Searching for MobileNetV3,” In Proceeding of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.1314-1324, 2019. ［12］ B. Zoph, and Q. Le, “Neural Architecture Search with Reinforcement Learning,” In Proceedings of International Conference on Learning Representations (ICLR), 2017. ［13］ B. Zoph, V. Vasudevan, J. Shlens, and Q. Le, “Learning Transferable Architectures for Scalable Image Recognition,” In Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8697-8710, 2017. ［14］ B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing Neural Network Architectures using Reinforcement Learning,” In Proceedings of International Conference on Learning Representations (ICLR), 2017. ［15］ H.V. Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning with Double Q-Learning,” AAAI Conference on Artificial Intelligence, 2015. ［16］ Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling Network Architectures for Deep Reinforcement Learning,” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, pp. 1995–2003, 2016. ［17］ R. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy Gradient Methods for Reinforcement Learning with Function Approximation,” In Proceedings of the 12th International Conference on Neural Information Processing Systems, pp. 1057–1063 , 1999. ［18］ C. J. C. H. Watkins, and P. Dayan, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992. ［19］ V. Mnih, A. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning,” In Proceedings of the 33rd International Conference on Machine Learning, pp. 1928–1937, 2016. ［20］ T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” In Proceedings of International Conference on Learning Representations (ICLR), 2016. ［21］ T. Elsken, J. Metzen, and F. Hutter, “Neural architecture search: A survey,” The Journal of Machine Learning Research, 20 (1), pp. 1997–2017, 2019. ［22］ M. Wistuba, A. Rawat, and T. Pedapati, “A Survey on Neural Architecture Search,” ArXiv, abs/1905.01392, 2019. ［23］ P. Ren, Y. Xiao, X. Chang, P.y. Huang, Z. Li, X. Chen, and X. Wang. “A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions,” ACM Computing Surveys, 54 (4), pp. 1–34, 2021 ［24］ C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015. ［25］ C. Ying, A. Klein, E. Christiansen, E. Real, K. Murphy, and F. Hutter, “NAS-Bench-101: Towards Reproducible Neural Architecture Search,” In Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 7105–7114, 2019. ［26］ X. Dong, and Y. Yang, “NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search,”. In Proceedings of International Conference on Learning Representations (ICLR), 2020. ［27］ J.N. Siems, L. Zimmer, A. Zela, J. Lukasik, M. Keuper, and F. Hutter, “NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search,” ArXiv, abs/2008.09777, 2020. ［28］ S. Yan, C. White, Y. Savani, and F. Hutter, “NAS-Bench-x11 and the Power of Learning Curves,” In Proceedings of Advances in Neural Information Processing Systems, 2021. ［29］ Y. Mehta, C. White, A. Zela, A. Krishnakumar, G. Zabergja, S. Moradian, M. Safari, K. Yu, and F. Hutter, “NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy,” In Proceedings of International Conference on Learning Representations (ICLR), 2022. ［30］ M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q.V. Le, “MnasNet: Platform-Aware Neural Architecture Search for Mobile,” In Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2815-2823, 2019. ［31］ M. Tan, and Q. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” In Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 6105-6114, 2019. ［32］ M. Guo, Z. Zhong, W. Wu, D. Lin and J. Yan, “IRLAS: Inverse Reinforcement Learning for Architecture Search,” In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ［33］ M. Suganuma, S. Shirakawa, and T. Nagao, “A Genetic Programming Approach to Designing Convolutional Neural Network Architectures,” In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504, 2017 ［34］ L. Xie, and A. Yuille, “Genetic CNN,” In Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1388-1397, 2017. ［35］ E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized Evolution for Image Classifier Architecture Search”, In Proceedings of the AAAI Conference on Artificial Intelligence, 33 (01), pp. 4780-4789, 2019. ［36］ H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable Architecture Search,” In Proceedings of International Conference on Learning Representations (ICLR), 2019. ［37］ J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, and X. Wang, “Densely Connected Search Space for More Flexible Neural Architecture Search,” In Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10625-10634, 2020. ［38］ G. Bender, P. Kindermans, B. Zoph, V. Vasudevan, and Q.V. Le, “Understanding and Simplifying One-Shot Architecture Search,” In Proceedings of International Conference on Machine Learning (ICML), 2018. ［39］ A. Krizhevsky, and G. Hinton, “Learning multiple layers of features from tiny images,” Citeseer, Tech. Rep., 2009.
指導教授	范國清陳以錚(FAN,GUO-QING CHEN,YI-ZHENG)	審核日期	2023-7-24
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 110522046 詳細資訊