參考文獻 |
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015
[2] Krizhevsky, I. Sutskever, and G. E Hinton, “Imagenet classification with deep convolutional neural networks,” In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
[3] F. Schroff, D. Kalenichenko and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 815-823, 2015.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” In Advances in Neural Information Processing Systems, pages 91–99, 2015.
[5] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, May 2015.
[6] Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1–9.
[7] He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778, 2016.
[8] R. Hameed et al., “Understanding sources of inefficiency in generalpurpose chips,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 37–47.
[9] M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, pp. 10–14.
[10] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 127–138, Jan. 2017
[11] Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, Huazhong Yang, “Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017.
[12] Li Du, Yuan Du, Yilei Li, Junjie Su, Yen-Cheng Kuan, Chun-Chen Liu, Mau-Chung Frank Chang, “A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things”, IEEE Transactions on Circuits and Systems I: Regular Papers, 2018.
[13] Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In Proc. 44th Annual International Symposium on Computer Architecture 1–12 (Association for Computing Machinery, 2017).
[14] A. Aimar, H. Mostafa, E. Calabrese, A. Rios-Navarro, R. TapiadorMorales, I.-A. Lungu, M. B. Milde, F. Corradi, A. Linares-Barranco, S.-C. Liu et al., “Nullhop: A flexible convolutional neural network accelerator based on sparse representations of feature maps”, IEEE transactions on neural networks and learning systems, 2018
[15] Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte, “An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-Matrix Multiplication”, ACM Transactions on Architecture and Code Optimization (TACO), 2022.
[16] Yu-Cheng Tseng; Po-Hsiung Hsu; Tian-Sheuan Chang, “A 124 Mpixels/s VLSI Design for Histogram-Based Joint Bilateral Filtering”, IEEE Transactions on Image Processing, 2011
[17] T. Chen et al. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, pp. 269–284, 2014.
[18] Y. Chen et al., “DaDianNao: A machine-learning supercomputer,” in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchit., 2014, pp. 609–622.
[19] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. Eie: efficient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 243–254. IEEE, 2016.
[20] W. Lu, G. Yan, J. Li, S. Gong, Y. Han, and X. Li, “FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks,” Proc. - Int. Symp. High-Performance Comput. Archit., pp. 553–564, 2017.
[21] Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, Henk Corporaal, “Memory-centric accelerator design for Convolutional Neural Networks”, IEEE 31st International Conference on Computer Design (ICCD), 2013.
[22] C. Zhang, P. li, G. Sun, Y. Guan, B. Xiao and J. Cong, “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks,” FPGA’15, February 22-24, 2015.
[23] Yufei Ma, Yu Cao, Sarma Vrudhula, Jae-sun Seo, “Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2018.
[24] X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong, “Automated systolic array architecture synthesis for high throughput cnn inference on fpgas,” in Proceedings of the 54th Annual Design Automation Conference 2017. ACM, 2017, p. 29.
[25] W. Xu, Z. Zhang, X. You, and C. Zhang, “Efficient deep convolutional neural networks accelerator without multiplication and retraining,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 1100–1104.
[26] Y. Huan, J. Xu, L. Zheng, H. Tenhunen, and Z. Zou, “A 3d tiled low power accelerator for convolutional neural network,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2018, pp.1–5.
[27] Saptarsi Das, Arnab Roy, Kiran Kolar Chandrasekharan, Ankur Deshwal,Sehwan Lee, “A Systolic Dataflow Based Accelerator for CNNs,” IEEE International Symposium on Circuits and Systems (ISCAS).IEEE,2020.
[28] Trio Adiono, Adiwena Putra, Nana Sutisna, Infall Syafalni, Rahmat Mulyawan, “Low Latency YOLOv3-Tiny Accelerator for Low-Cost FPGA Using General Matrix Multiplication Principle”, IEEE Access, 2021.
[29] Hweesoo Kim, Sunjung Lee, Jaewan Choi, Jung Ho Ahn, “Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure”, IEEE Computer Architecture Letters, 2021.
[30] Lee, Y.-H., Yu, N.-A., & Tsai, C.-Y., “A Image Upscaling Engine for 1080p to 4K Using Gradient-Based Interpolation”, International Journal of Electronics, 2020.
[31] Liqiang Lu, Yun LiangG, “SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs” ACM/ESDA/IEEE Design Automation Conference (DAC), 2018.
[32] Bahareh Khabbazan, Sattar Mirzakuchaki, “Design and Implementation of a Low-power, Embedded CNN Accelerator on a Low-end FPGA”, Euromicro Conference on Digital System Design (DSD), 2019.
[33] Yufeng Li, Shengli Lu*, Jihe Luo, Wei Pang, Hao Liu, “High-performance Convolutional Neural Network Accelerator Based on Systolic Arrays and Quantization”, International Conference on Signal and Image Processing, 2019.
[34] Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang, “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network”, ACM/SIGDA International Symposium on Field-Programmable Gate, 2016.
[35] Muluken Tadesse Hailesellasie, Syed Rafay Hasan, “MulNet: A Flexible CNN Processor With Higher Resource Utilization Efficiency for Constrained Devices”, IEEE Access, 2019. |