隨著深度學習日漸蓬勃,其中卷積神經網路(CNN)進步尤為顯著,使得物件偵測神經網路在電腦視覺領域中取得了巨大進步,這些模型可以在各種複雜場景中準確檢測物件的位置和類別,並已實際應用於人們的日常生活中。在CNN模型的推理階段中,卷積層的運算佔很大的比例,在卷積層中因為輸入圖片會和許多權重核進行乘加運算(MAC),且隨著網路層逐漸加深,參數量和計算量也會不斷提高。因此,許多研究提出了不同的計算方法和硬體架構來有效處理資料以加速神經網路的計算時間。 本文提出了一種盲人輔助系統,該系統由一個ARM CPU 和一個用於物件偵測任務的神經網路加速器模組組成的SoC架構,基於 Tiny You-Only-Look Once version 2(Tiny YOLOv2)和文本到語音(text-to-speech)來輔助盲人,目的是讓盲人更容易在陌生環境中獨自行走。 我們在模型量化上使用訓練後量化的方式對模型進行分析,使的模型可以在不損失太多準確性的情況下,大量的減少參數量。在架構方面,為了在內部記憶體資源有限制的情況下有效利用內部記憶體,因此我們會在硬體完成所有需要zero-padding的部分,在卷積運算中我們提出了一種卷積運算單元(Convolution Unit),藉由Row Stationary(RS)和將3x3卷積切割成數個1x1 卷積的處理方式,使每個PE可以最大限度地重複使用權重,並且可以支援 3x3 和 1x1 卷積運算。該系統在 Zynq UltraSacle+MPSoC EGO-ZU19EG FPGA 上實現,實驗結果表明,當頻率為166MHz時,該系統可達到 169.98 GOPS功耗為6.599W,達到25.76GOPS/W的效能。;With the rapid growth of deep learning, Convolutional Neural Networks (CNNs) have made particularly significant advancements, leading to major progress in object detection networks within the field of computer vision. These models can accurately detect the location and category of objects in various complex scenes and have been practically applied in people′s daily lives. During the inference stage of CNN models, convolutional layers account for a large proportion of the computations. In these layers, input feature maps will be multiplied-accumulated (MAC) operations with many weight kernels. As the network layers deepen, the number of parameters and computations also increases. Consequently, numerous studies have proposed different computation methods and hardware architectures to efficiently process data and accelerate the computation time of neural networks. In this paper, we propose an assistive system for the blind, which consists of a System-on-Chip (SoC) architecture with an ARM CPU and a neural network accelerator module for object detection tasks for Tiny You-Only-Look Once version 2 (Tiny YOLOv2) and text-to-speech to assist the blind, enabling them to walk independently in unfamiliar environments. We use post-training quantization to analyze the model, significantly reducing the number of parameters without compromising accuracy. In terms of architecture, to effectively utilize the internal memory resources under constraints, we will complete all the parts that require zero-padding in the hardware. For convolution operations, we introduce a Convolution Unit that maximizes the reuse of weights for each PE and supports both 3x3 and 1x1 convolutional operations by using Row Stationary (RS) and decomposes3x3 convolutions into multiple 1x1 convolutions. The system is implemented on a Zynq UltraSacle+MPSoC EGO-ZU19EG FPGA, and the experimental results show that the system achieves 169.98 GOPS with a power consumption of 6.599W and the energy efficiency can achieve 25.76 GOPS/W when the frequency is 166MHz.