摘要: | 卷積神經網路 (Convolution Neural Network) 是目前深度神經網路 (Deep Neural Network) 領域中所發展的重點。因為它在圖片辨識上甚至可以做到比人類還要精準,所以近年來 CNN 有越來越多應用在目標檢測或是圖像分類上。但是如果同時將所有數據一起送到中央處理節點進行計算,龐大的資料流和複雜的計算,會使系統造成瓶頸,降低整體的運算性能。為了有效解決此瓶頸,人們提出了在邊緣裝置上直接進行運算,而不是將資料送到中央處理節點運算,因此有了邊緣運算的技術,但是邊緣裝置上的計算能力是有限的,所以我們無法將龐大的CNN網路放到邊緣裝置上運算。因此為了克服這一挑戰,人們提出了輕量級神經網絡 (Lightweight Neural Networks) 來降低計算以及網路的複雜度,同時能在邊緣裝置上進行運算。 最近,有兩種輕量級神經網絡 MobileNet和ShuffleNet 被廣泛討論,同時這兩種網路透過適當的訓練及設定,可以使這兩種網路的精準度下降是有限的,並且可以被使用者所接受。然而,目前大多數先進的 AI 加速器並不適用於 MobileNet和ShuffleNet。因此,在本論文中,我們提出了一種新穎的 AI 加速器,它可以支持這兩種輕量級神經網絡的運作以及通道洗牌之功能。與過去的加速器不同的地方在於我們的加速器可以有Depthwise、Pointwise Convolution以及Pointwise Group Convolution之運算。所以我們的加速器相當適合兩種網路的特性。 實驗結果表明,我們的設計可以成功計算 MobileNet 和 ShuffleNet。此外,與之前的作品相比,我們在FPGA的驗證下,資料吞吐量可以提升3.7%,並且在45奈米的製程下,保持相同性能最多可以減少56%的面積,功率也能減少58%。;Convolution Neural Network is currently the focus of development in the field of Deep Neural Network. Because it can even be more accurate than humans in image recognition, CNN has been used more and more in object detection or image classification in recent years. However, if all the data are sent to the central processing node for calculation at the same time, the large amount of data and complex calculations will cause a bottleneck in the system and reduce the overall computing performance. To solve this bottleneck effectively, the researchers proposed performing operations directly on edge devices instead of sending data to a central processing node for computing. Therefore, there is edge computing technology. Still, the computing power on the edge device is limited, so we cannot put the huge CNN network on the edge device for computing. Accordingly, to overcome this challenge, Lightweight Neural Networks have been proposed to reduce the computation and network complexity while enabling computation on edge devices. Recently, two lightweight neural networks have been widely used, namely MobileNet and ShuffleNet. At the same time, these two networks will be properly trained and set so that the accuracy of the two networks is limited and acceptable to users. However, most of the current advanced AI accelerators are unsuitable for MobileNet and ShuffleNet. Thus, in this thesis, we propose a low-power low-area reconfigurable AI accelerator design for MobileNet and ShuffleNet. The difference from previous works is that our accelerator can support depthwise, pointwise, and pointwise group convolution. So our accelerator is quite suitable for the characteristics of both networks. Experimental results show that our accelerator design can successfully compute MobileNet and ShuffleNet. In addition, we can increase the throughput by 3.7% under the verification of FPGA and reduce the area by 56% with the same performance at 45nm, as well as reduce power by 58% compared to previous works. |