深度卷積神經網路(DCNNs)被廣泛地用於人工智慧的應用,例如:物件辨識及影像分類等等。現今的深度卷積神經網路具有大量計算與大量數據的特性,為了在不同應用中符合對性能的要求,加速器被用來執行深度卷積神經網路的運算。在本論文中,我們根據以動態隨機存取記憶體(DRAM)儲存資料及使用加速器來執行計算的深度卷積神經網路推論系統,提出架構探索的方法。此方法以減少資料傳輸時間與計算時間的差異而定義出之加速器架構。加速器包含了數叢(clusters)之處理單元(PEs)、一個可重構之記憶體單元及一個控制器。交換器(switch)用以連接一叢處理單元陣列與可重構記憶體單元。可重構記憶體是由三個靜態隨機存取記憶體組合而成,每一靜態隨機存取記憶體可以調整其大小,以符合不同卷積層的記憶體需求。處理單元陣列與可重構記憶體之組態是由基於子層之參數選定流程(sublayer-based parameters decision flow)所決定。與現存之研究相比,本論文提出之加速器在卷積層及深度卷積神經網路各提升4.2%及17.4%的硬體利用率。我們根據提出之可重構加速器架構,在Xilinx ZCU-102開發板上實現了一個推論MobileNet V1的可重構加速器,此一加速器包含了1092KB的靜態隨機存取記憶體與四叢處理單元陣列,每一叢處理單元陣列包含了8個處理單元。實驗結果達到在150MHz的操作頻率下,此一加速器達到每秒1440億次計算及每秒推論40.1張圖片的效能。;Deep convolutional neural networks (DCNNs) are widely used for the artificial intelligence applications, e.g., object recognition and image classification. A modern DCNN model usually needs a huge amount of computations and data. To meet the performance requirement of applications, an accelerator is usually designed to execute the computation of DCNN. In this thesis, we consider a DCNN inference system using a DRAM to store data and an accelerator to execute the computation. An architecture exploration method based on the minimization of difference between DRAM data access time and computation time is proposed to define the architecture of accelerator. The accelerator consists of multiple clusters of processing elements (PEs), a reconfigurable memory unit, and a controller. A cluster of PEs is connected to the reconfigurable memory unit through a switch box. The reconfigurable memory unit consists of three static random access memories which sizes can be dynamically changed to fit the requirement of different convolutional layers. The configurations of PE array and reconfigurable memory are determined by sublayer-based parameters decision flow which can gain 4.2% and 17.4% increment of hardware resource utilization for convolutional layers and DCNN model in comparison with existing works. We implement the MobileNet V1 model in Xilinx ZCU-102 evaluation board using the proposed reconfigurable accelerator architecture with 1092KB SRAM and four PE clusters in which each cluster has 8 PEs. Ex- perimental results show that 144 GOPS and 40.1 FPS can be achieved under 100MHz clock rate.