RS-NAS:基於策略與價值含獎勵重構之 強化學習於網路結構搜索

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	張慕平	zh_TW
DC.creator	ZHANG,MU-PING	en_US
dc.date.accessioned	2023-7-24T07:39:07Z
dc.date.available	2023-7-24T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110522046
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	十年來，隨著硬體效能的增加，深度學習成為了熱門的研究對象，其中在電腦視覺中，卷積神經網路（Convolutional Neural Networks , CNNs）是廣為人知的技術。研究過程中，人們發現複雜的網路模型往往能獲得更高的準確率，但在一些資源有限的終端設備上，複雜模型所帶來的龐大資源消耗，大幅度地限制了CNN的使用。因此近幾年，許多研究都專注於網路結構搜索（Neural Architecture Search, NAS）領域：根據不同目標來自動設計網路模型的技術，而在NAS領域中，我們根據優化方法的不同，將其分成三個類別：強化學習（Reinforcement Learning, RL）、進化演算法（Evolutionary Algorithms, EA）、可微分優化（Differentiable Optimization）。本篇論文針對強化學習方法的 NAS任務上提出一種新的獎勵重構（Reward Shaping）機制，我們稱為RS-NAS，目的是解決強化學習在NAS搜索過程中，會遭遇的稀疏獎勵挑戰，強化學習中的代理人（Agent）無法在搜索過程中獲得獎勵，只能根據最後一步搜索出的模型架構來取得獎勵，這樣使代理人無法評估搜索過程中的每一步優劣，從而降低整體的搜索效能。我們使用兩種強化學習演算法來實作RS-NAS，一種是基於策略（Policy-Based）的近端策略優化（Proximal Policy Optimization, PPO）；另一種是基於價值（Value-Based）的深度Q網路（Deep Q Network, DQN）。同時為了降低搜索成本與變因，讓不同方法盡量在同一標準上比較，本篇論文中我們使用NATS。當作我們的搜索空間，相較於NATS原本的強化學習方法，RS-NAS有更好的搜索性能與穩定性。	zh_TW
dc.description.abstract	Over the past decade, deep learning emerges as a popular research domain with the upgrading of hardware performance. Recently, Convolutional Neural Networks (CNNs) have been admitted as a significant success in computer vision. Moreover, researchers observe that complex network models can often achieve higher accuracy. However, complex models greatly limit the use of CNNs on resource-constrained devices. As a result, many researchers focus their attention on Neural Architecture Search (NAS) recently, which aims at automatically designing network models based on different objectives. Among them, Reinforcement Learning (RL) is a commonly utilized optimization method in NAS. In this thesis. we propose a novel reward shaping mechanism called RS-NAS for designing the RL-based NAS task. The objective is to address the challenge of sparse rewards encountered during the search process in RL. In traditional RL, agents cannot obtain rewards during the search process and can only receive rewards based on the final model architecture obtained. It prevents agents from evaluating the quality of each step in the search process, and hence reduces overall search efficiency. The proposed RS-NAS is implemented using two RL algorithms: Proximal Policy Optimization (PPO) which is a policy-based method, and Deep Q Network (DQN) which is a value-based method. In this thesis, we utilize NATS as the search space to reduce search costs and alleviate the factors undergoing on different methods for fair comparison. Comparing with the original RL methods in NATS, experimental results verify that the proposed RS-NAS demonstrates better search performance and stability.	en_US
DC.subject	強化學習	zh_TW
DC.subject	稀疏獎勵	zh_TW
DC.subject	網路結構搜索	zh_TW
DC.subject	Reinforcement Learning	en_US
DC.subject	Sparse Reward	en_US
DC.subject	Neural Architecture Search	en_US
DC.title	RS-NAS:基於策略與價值含獎勵重構之強化學習於網路結構搜索	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	RS-NAS: A Policy and Value-Based Reinforcement Learning with Reward Shaping on Neural Architecture Search	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110522046 完整後設資料紀錄