| 摘要: | 隨著深度學習在影像識別與物件偵測領域取得突破,從最初的AlexNet、VGG,到後續的Fast R-CNN、SSD等架構,模型準確度持續上升,卻往往犧牲推論延遲或功耗為代價。尤其在實際部署場景中,若將整個浮點模型放置於功率受限的嵌入式平台或無人機上,不但需要龐大散熱系統,也會大幅縮短電池續航時間,且硬體成本與體積都不易滿足輕量化需求。 YOLO系列以單階段、端到端的設計將偵測速度提升許多,但當模型規模增大,傳統GPU難以在功耗與成本受限的場域,如嵌入式設備與無人機中長期運行。FPGA由於其高並行化、可重配置且具備優異的能效比特性,成為實時且低功耗推論的理想平台,藉由在硬體層面實現深度流水線與位元量化,FPGA可於單一晶片上同時達到高吞吐量與低延遲,且其功耗遠低於GPU。 為滿足此需求,本論文提出一套基於高層次綜合(HLS)的YOLOv7-Tiny加速器架構,並在Xilinx ZCU106開發板上實作與驗證,於200 MHz操作頻率下運行,整體功耗僅5.089 W,能效達到19.375 GOPs/W,展現出低功耗下仍能達到優秀的計算效率的能力。實驗亦進一步將該加速器應用於船艦目標偵測任務,透過無人機視角之紅外線影像進行訓練,以因應海上場景中背景複雜與光源不穩等挑戰。這一成果不僅顯示了透過高度並行化與位元量化所帶來的性能提升,也證明了FPGA平臺在兼顧運算效能與能效方面的獨特優勢。所提架構具備靈活可重配置特性,可廣泛部署於無人機視覺、智慧監控與工業自動化等嵌入式應用,並具備後續升級與擴展模型的潛力。 ;As deep learning has made breakthroughs in image recognition and object detection, evolving from early architectures such as AlexNet and VGG to subsequent frameworks like Fast R-CNN and SSD, model accuracy has steadily increased at the expense of inference latency and power consumption. In practical deployment scenarios, running a full-precision model on power-constrained embedded platforms or unmanned aerial vehicles not only demands elaborate cooling systems but also drastically reduces battery life. At the same time, hardware cost and form factor often fail to meet lightweight requirements. The YOLO series significantly improves detection speed with its one-stage, end-to-end design; however, as model size increases, traditional GPUs struggle to run in power and cost-constrained environments, such as embedded devices and UAVs, over the long term. FPGAs are becoming the ideal platform for real-time, low-power inference due to their high parallelism, reconfigurability, and excellent energy efficiency. By implementing deep pipelining and bit quantization at the hardware level, FPGAs can achieve both high throughput and low latency on a single chip, with power consumption far lower than that of GPUs. To address this requirement, this paper proposes a YOLOv7-Tiny accelerator architecture based on High-Level Synthesis (HLS), which is implemented and verified on the Xilinx ZCU106 development board, operating at a frequency of 200 MHz. with a total power consumption of only 5.089 W and an energy efficiency of 19.375 GOPs/W, demonstrating the ability to achieve excellent computational efficiency even at low power consumption. The experiment further applied this accelerator to a ship target detection task, using infrared images from a drone′s perspective for training, to address challenges such as complex backgrounds and unstable lighting in maritime scenarios. This achievement not only demonstrates the performance improvements achieved through high parallelisation and bit quantisation but also highlights the unique advantages of the FPGA platform in balancing computational performance and energy efficiency. The proposed architecture features flexible reconfigurability, enabling widespread deployment in embedded applications such as drone vision, smart surveillance, and industrial automation, while also offering potential for future upgrades and model expansion. |