摘要(英) |
In deep learning field, Convolution Neural Networks (CNNs) have been achieved a significant success in many fields such as visual imagery analysis, self-driving car, respectively. However, data size and the accuracy of each system are the major target to estimate the efficient and effective computations. In conventional CNN models, 32bits data are frequently used to maintain high accuracy. However, performing a bunch of 32bits multiply-and-accumulate (MAC) operations causes significant computing efforts as well as power consumptions. Therefore, recently researchers develop various methods to reduce data size and speed up calculations. Quantization is one of the techniques which reduces the number of bits of the data as well as the computational complexity at the cost of accuracy loss. To provide better computation effort and accuracy trade-off, different bit number may be applied to different layers within a CNN model. Therefore, a flexible processing element (PE) which can support operations of different bit numbers is in demand. In this work, we propose a hierarchy-based reconfigurable processing element (PE) structure that can support 8bits x 8bits, 8bits x 4bits, 4bits x 4bits and 2bits x 2bits operations. The structure we propose applies the concept of hierarchical structure that can avoid the redundant hardware in the design. To improve the calculation speed, our 8bits x 8bits PE applies two stage pipelines. The experimental results with 90nm technology show that in 2bits x 2bits PE, we can save the area by 57.5% to 60% compared to a Precision-Scalable accelerator. In the 8bits x 8bits PE, the two-stage pipelines can maintain almost the same calculation speed of the 4bits x 4 bits PE. |
參考文獻 |
[1] J. Albericio et al., “Cnvlutin: Ineffectual-neuron-free deep neural network computing”, in Proc. of ACM SIGARCH Computer Architecture News, 2016.
[2] J. Choi et al., "Accurate and efficient 2-bit quantized neural networks", in Proc. of the 2nd SysML Conference, Mar. 2019.
[3] T. Chen et al., “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning”, in Proc. of ACM SIGARCH Computer Architecture News, 2014.
[4] YJ. Chen et al., "Ct Image Denoising With Encoder-Decoder Based Graph Convolutional Networks", in Proc. of IEEE 18th International Symposium on Biomedical Imaging (ISBI). IEEE, Apr. 2021.
[5] Z. Du et al., “ShiDianNao: Shifting vision processing closer to the sensor”, in Proc. of the 42nd Annual International Symposium on Computer Architecture (ISCA), Jun. 2015.
[6] I. Hubara et al., “Quantized neural networks: Training neural networks with low precision weights and activations”, in Proc. of The Journal of Machine Learning Research, 2017.
[7] K. He et al., “Deep residual learning for image recognition”, in Proc. of IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016.
[8] D. Kim et al., “Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory”, in Proc. of ACM SIGARCH Computer Architecture News, 2016.
[9] A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks", in Proc. of Commun. ACM 60, Jun. 2017.
[10] Y. LeCun et al., “Gradient-based learning applied to document recognition”, in Proc. of IEEE, Nov. 1998.
[11] F. LI et al., “Ternary weight networks”, in Proc. of arXiv, 2016.
[12] W. Liu et al., "A Precision-Scalable Energy-Efficient Convolutional Neural Network Accelerator", in Proc. of IEEE Transactions on Circuits and Systems I: Regular Papers, Oct. 2020.
[13] D. Liu et al., “Pudiannao: A polyvalent machine learning accelerator”, in Proc. of ACM SIGARCH Computer Architecture News, 2015.
[14] S. Liu et al. “Cambricon: An instruction set architecture for neural networks”, in Proc. of ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Jun. 2016
[15] D. Lin et al., "Fixed point quantization of deep convolutional networks", in Proc. of International conference on machine learning, PMLR, Jun. 2016.
[16] A. Mishra et al., "Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy", in Proc. of arXiv, 2017.
[17] A. Mishra et al., "WRPN: Wide reduced-precision networks", In Proc. of arXiv, 2017.
[18] O. Russakovsky et al. "Imagenet large scale visual recognition challenge", in Proc. of International journal of computer vision, 2015.
[19] S. Ren et al., "Faster r-cnn: Towards real-time object detection with region proposal networks", in Proc. of arXiv, 2015.
[20] J. Redmon et al., "You only look once: Unified, real-time object detection", in Proc. of the IEEE conference on computer vision and pattern recognition (CVPR), Jun. 2016.
[21] B. Reagen et al., “Minerva: Enabling low-power, highly-accurate deep neural network accelerators”, in Proc. of ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Jun. 2016.
[22] H. Sharma et al., "Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network", in Proc. of ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Jun. 2018
[23] C. Szegedy et al., “Going deeper with convolutions”, in Proc. of IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015.
[24] P. Wang et al., "Two-Step Quantization for Low-bit Neural Networks", in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2018.
[25] Y. Wang et al., "FPAP: A Folded Architecture for Energy-Quality Scalable Convolutional Neural Networks," in Proc. of IEEE Transactions on Circuits and Systems I: Regular Papers, Jan. 2019.
[26] Z. Wang et al., "Lightweight Run-Time Working Memory Compression for Deployment of Deep Neural Networks on Resource-Constrained MCUs." in Proc. of the 26th Asia and South Pacific Design Automation Conference (ASP DAC), Jan. 2021.
[27] X. Xu et al., "DAC-SDC Low Power Object Detection Challenge for UAV Applications", in Proc. of IEEE Transactions on Pattern Analysis and Machine Intelligence, Feb. 2021.
[28] Z. Yao et al., "A machine learning-based pulmonary venous obstruction prediction model using clinical data and CT image", in Proc. of International Journal of Computer Assisted Radiology and Surgery, 2021.
[29] SJ. Zhang et al., “Cambricon-X: An accelerator for sparse neural networks”, in Proc. of 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2016. |