摘要: | 正如我們所知,設計神經網路架構需要大量的手工努力。因此促進了神經網路架構搜索(NAS)的發展。但訓練和驗證每個候選架構需要大量的時間,因此如何在最少的時間成本下找到效能最好的神經網路架構就是NAS領域很重要的衡量指標。最近研究者會採用迭代式訓練策略(例如BRP-NAS, WeakNAS)或者結合zero-cost(例如:ProxyBO)讓訓練過程中盡量挑選高效能的架構來訓練預測器,事實也證明在相同預算下會強於隨機挑選訓練架構訓練出來的預測器,這因此激發我們做出進一步猜想:在迭代式訓練策略中,如果相同訓練預算下只保留一部分高分架構來訓練預測器,會不會比全部訓練預算都拿來訓練的預測器還要強?我們對此做了一系列的實驗並且驗證了此猜想,而且效果非常卓越,因此我們將此發現結合迭代式訓練策略,提出了Highly Targeted Training Strategy(HTTS)。在預測器架構方面,我們針對Predictor-based NAS領域中基於雙向圖形卷積網路(Bi-GCN)的強預測器架構進行分析和優化。在本論文中,我們提出了更強力的預測器:Fully-BiGCN,其大幅加強了預測器對每層特徵的重視,使用Fully-BiGCN預測器搭配HTTS,我們發展出NAS新方法:HTTP-NAS。跟目前Predictor-based NAS領域的SOTA(WeakNAS)相比,HTTP-NAS取得了很好的效果,以NAS-Bench-201當作Benchmark,分別只需要WeakNAS的27.1% (CIFAR10), 49.0% (CIFAR100), 51.75% (ImageNet16-120)的訓練預算,預測器就可以找到全局最佳架構。;As we know, the design of a neural network architecture requires a significant amount of manual effort. It hence spurs the development of Neural Architecture Search (NAS). However, the training and evaluation of each candidate′s architecture requires tremendous amount of time. Thus, finding the best-performing neural network architecture with minimal computation cost is a crucial event in the NAS research. Recently, researchers adopt iterative training strategies (e.g., BRP-NAS, WeakNAS) or combine them with zero-cost approaches (e.g., ProxyBO) to train predictors to select high-performance architectures during the training process. It has been observed that these methods outperform random sample-based training architectures under the same cost. It hence leads to a hypothesis: If we train a predictor by retaining only a subset of high-score architectures within the same training budget, will it be more robust than a predictor trained with the entire training? We have conducted a series of experiments to validate this hypothesis and found significant improvements. Combining this discovery with the iterative training strategy, we proposed the Highly Targeted Training Strategy (HTTS). In terms of predictor architecture, we analyze and optimize the strong predictor architecture based on the Bidirectional Graph Convolutional Network (Bi-GCN) in the field of Predictor-based NAS. In this thesis, we propose a more powerful predictor called Fully-BiGCN which can significantly enhance the emphasis of the predictor on each layer′s features. Using the Fully-BiGCN predictor with HTTS, a new NAS method called HTTP-NAS is developed. HTTP-NAS achieves remarkable results comparing with the state-of-the-art in Predictor-based NAS (WeakNAS),. Using NAS-Bench-201 as the benchmark, HTTP-NAS required only 27.1% (CIFAR10), 49.0% (CIFAR100), and 51.75% (ImageNet16-120) of training cost of WeakNAS in finding the globally optimal architecture. |