一個有效的邊緣智慧運算加速器設計: 一種適用於深度可分卷積的可重組式架構

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：3.141.30.162

姓名

江鴻儀(Hung-Yi Chiang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

一個有效的邊緣智慧運算加速器設計: 一種適用於深度可分卷積的可重組式架構
(An Efficient Accelerator Design for Edge AI: A Reconfigurable Structure for Depthwise Separable Convolution)

相關論文

★ 用於類比電路仿真之波動數位濾波器架構的自動建構方法	★ 使用波動數位濾波器與非線性MOS模型的類比電路模擬平台
★ 實現波動數位濾波器架構下之類比仿真器的非線性電晶體模型	★ 以節點保留方式進行壓降分析中電源網路模型化簡的方法
★ 以引導式二階權重提取改進辨認二階臨界函數之研究	★ 用於類比電路仿真器的波動數位濾波器架構之定點數實現方法
★ 以基本類比電路架構為基礎的佈局自動化工具	★ 可保留設計風格及繞線行為之類比佈局遷移技術
★ 自動辨識混合訊號電路中數位區塊之方法	★ 運用於記憶體內運算的SRAM功率模型之研究
★ 考量可繞度及淺溝槽隔離效應之類比佈局擺置微調方法	★ 一個適用於量化深度神經網路且可調整精確度的處理單元設計: 一種階層式的設計方法
★ 實現類比電路仿真的波動數位濾波器架構生成與模擬	★ 用於類比電路仿真器的波動數位濾波器之硬體最佳化方法
★ 自動辨識混合訊號電路中構成區塊及RLC元件之方法	★ 以波動數位濾波器實現類比電路仿真器所需的FPGA表格縮減技術

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

卷積神經網絡（convolution neural network）已廣泛應用於電腦視覺任務(computer vision tasks)的領域，然而標準的神經網絡需要大量的運算和參數，這對嵌入式設備而言是個挑戰。因此前人提出了一種新穎的神經網路架構MobileNets，MobileNets採用深度可分離卷積(depthwise separable convolution)代替標準卷積，使其運算量和參數大幅減少且精度損失有限。而MobileNets中主要有兩種不同的計算方法pointwise和depthwise，如果用傳統的加速器來計算這兩種不同的運算，會因為運算參數和方式的不同而造成硬體利用率低下。除此之外，常見降低神經網路計算負擔的方法還有量化(quantization)，其透過減少位寬(bit width)或採用不同位寬來降低計算負荷，但如果用相同精度的硬體來計算不同位寬的資料，則無法有效的節省運算時間。基於MobileNets和量化網路，本文提出了一種可以有效計算量MobileNets的新型計算架構，以達到加速運算和節省面積的效果。

摘要(英)

Convolution neural network (CNN) has been widely applied in the fields of computer vision applications. However, conventional neural network computations require a lot of operations and parameters, which becomes a challenge for embedded devices. MobileNets, a novel CNN which adopts depthwise separable convolution to replace the standard convolution, has substantially reduced operations and parameters with only limited loss in accuracy. There are mainly two different calculation methods in MobileNets, pointwise and depthwise. If the same accelerator is used to perform these two different operations, the accelerator may not able to be fully exploited due to different operation parameters. In addition, there are some methods for neural network quantization, which limit the bit width to reduce computing energy and parameters. If the same precision hardware is used to calculate quantized operations, the maximum benefit cannot be achieved. Therefore, A novel architecture which can effectively calculate quantized MobileNets is proposed in this thesis.

關鍵字(中)

★ 人工智慧加速器
★ 可重組架構
★ 輕量化網路

關鍵字(英)

★ AI accelerator
★ Reconfigurable Structure
★ MobileNets

論文目次

摘要 II
Abstract III
致謝 IV
Table of Contents V
Table of Figures VI
Table of Table VII
Chapter 1 Introduction 1
Chapter 2 Background 5
2.1 Depthwise Separable Convolution 5
2.2 CNN Processor for MobileNets 6
2.3 Precision-Scalable Process Element 8
Chapter 3 Reconfigurable Accelerator Design 11
3.1 Architecture Overview 11
3.2 Sum of eight multiplication(S8) 13
3.3 Extra Multipliers and Accumulate Adders 15
3.4 Merge Adders 16
3.5 Dataflow 18
Chapter 4 Experiment Results 23
4.1 Experimental Setup 23
4.2 Comparisons 25
Chapter 5 Conclusions 27
References 28

參考文獻

[1] Ching-Che Chung, Wei-Ting Chen, Ya-Ching Chang “Using Quantization-Aware Training Technique with Post-Training Fine-Tuning Quantization to Implement a MobileNet Hardware Accelerator” in Proc. of Indo-Taiwan 2nd International Conference on Computing, Analytics and Networks, pp. 28-32, Feb. 2020.
[2] Raghudeep Gadde, Varun Jampani, Peter V. Gehler “Semantic Video CNNs through Representation Warping”, in Proc. of IEEE International Conference on Computer Vision, pp. 4453-4462, Oct. 2017.
[3] Ross Girshick, “Fast R-CNN”, in Proc. of IEEE International Conference on Computer Vision, pp. 1440-1448, Dec. 2015.
[4] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, Jun. 2014.
[5] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” CoRR, vol. abs/1704.04861, 2017.
[6] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “Imagenet classification with deep convolutional neural networks”, in Proc. of Advances in Neural Information Processing Systems, pp. 1097-1105, Dec. 2012.
[7] Darryl Lin, Sachin Talathi, Sreekanth Annapureddy “Fixed Point Quantization of Deep Convolutional Networks” in Proc. of International Conference on Machine Learning, pp. 2849–2858, Jun. 2016.
[8] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, “You only look once: Unified real-time object detection”, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, Jun. 2016.
[9] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks”, in Proc. of Advances in Neural Information Processing Systems, pp. 91-99, Dec. 2015.
[10] Sungju Ryu, Hyungjun Kim, Wooseok Yi, Jongeun Koo, Eunhwan Kim, Yulhwa Kim, Taesu Kim, Jae-Joon Kim “A 44.1TOPS/W Precision-Scalable Accelerator for Quantized Neural Networks in 28nm CMOS” in Proc. of IEEE Custom Integrated Circuits Conference, Apr. 2020.
[11] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, Google Inc. “MobileNetV2: Inverted Residuals and Linear Bottlenecks” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, Jun. 2018.
[12] Karen Simonyan, Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition”, in Proc. of International Conference on Learning Representations, May. 2015.
[13] Di Wu, Yu Zhang, Xijie Jia, Lu Tian, Tianping Li, Lingzhi Sui, Dongliang Xie, Yi Shan “A High-performance CNN Processor Based on FPGA for MobileNets” in Proc. of International Conference on Field Programmable Logic and Applications, pp. 136-143, Sep. 2019.

指導教授

周景揚(Jing-Yang Jou)

審核日期

2021-10-26

推文