摘要: | 十年來,隨著硬體效能的增加,深度學習成為了熱門的研究對象,其中在電腦視覺中,卷積神經網路(Convolutional Neural Networks , CNNs)是廣為人知的技術。研究過程中,人們發現複雜的網路模型往往能獲得更高的準確率,但在一些資源有限的終端設備上,複雜模型所帶來的龐大資源消耗,大幅度地限制了CNN的使用。因此近幾年,許多研究都專注於網路結構搜索(Neural Architecture Search, NAS)領域:根據不同目標來自動設計網路模型的技術,而在NAS領域中,我們根據優化方法的不同,將其分成三個類別:強化學習(Reinforcement Learning, RL)、進化演算法(Evolutionary Algorithms, EA)、可微分優化(Differentiable Optimization)。 本篇論文針對強化學習方法的 NAS任務上提出一種新的獎勵重構(Reward Shaping)機制,我們稱為RS-NAS,目的是解決強化學習在NAS搜索過程中,會遭遇的稀疏獎勵挑戰,強化學習中的代理人(Agent)無法在搜索過程中獲得獎勵,只能根據最後一步搜索出的模型架構來取得獎勵,這樣使代理人無法評估搜索過程中的每一步優劣,從而降低整體的搜索效能。我們使用兩種強化學習演算法來實作RS-NAS,一種是基於策略(Policy-Based)的近端策略優化(Proximal Policy Optimization, PPO);另一種是基於價值(Value-Based)的深度Q網路(Deep Q Network, DQN)。 同時為了降低搜索成本與變因,讓不同方法盡量在同一標準上比較,本篇論文中我們使用NATS。當作我們的搜索空間,相較於NATS原本的強化學習方法,RS-NAS有更好的搜索性能與穩定性。;Over the past decade, deep learning emerges as a popular research domain with the upgrading of hardware performance. Recently, Convolutional Neural Networks (CNNs) have been admitted as a significant success in computer vision. Moreover, researchers observe that complex network models can often achieve higher accuracy. However, complex models greatly limit the use of CNNs on resource-constrained devices. As a result, many researchers focus their attention on Neural Architecture Search (NAS) recently, which aims at automatically designing network models based on different objectives. Among them, Reinforcement Learning (RL) is a commonly utilized optimization method in NAS. In this thesis. we propose a novel reward shaping mechanism called RS-NAS for designing the RL-based NAS task. The objective is to address the challenge of sparse rewards encountered during the search process in RL. In traditional RL, agents cannot obtain rewards during the search process and can only receive rewards based on the final model architecture obtained. It prevents agents from evaluating the quality of each step in the search process, and hence reduces overall search efficiency. The proposed RS-NAS is implemented using two RL algorithms: Proximal Policy Optimization (PPO) which is a policy-based method, and Deep Q Network (DQN) which is a value-based method. In this thesis, we utilize NATS as the search space to reduce search costs and alleviate the factors undergoing on different methods for fair comparison. Comparing with the original RL methods in NATS, experimental results verify that the proposed RS-NAS demonstrates better search performance and stability. |