參考文獻 |
[1] S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,” Communications of the ACM, vol. 52, no. 4,pp. 65–76, 2009.
[2] Micron, 4Gb: x4, x8, x16 DDR3 SDRAM Features.
[3] APMemory, 1Gb DDR3 SDRAM Specification.
[4] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W.Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[5] P. G. Xilinx Inc, LogiCORE IP AXI Master Burst v2.0.
[6] U. G. Xilinx Inc, UltraScale Architecture DSP Slice.
[7] C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong, “Caffeine: toward uniformed
representation and acceleration for deep convolutional neural networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 11,pp. 2072–2085, 2018.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems,2012, pp. 1097–1105.
[9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
[10] M. Naphade, D. C. Anastasiu, A. Sharma, V. Jagrlamudi, H. Jeon, K. Liu, M.-C. Chang,
S. Lyu, and Z. Gao, “The NVIDIA AI city challenge,” in 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City
Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2017, pp. 1–6.
[11] A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very deep convolutional networks for natural language processing,” arXiv preprint arXiv:1606.01781, vol. 2, 2016.
[12] X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating very deep convolutional networks for classification and detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 1943–1955, 2015.
[13] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, realtime object detection,” in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2016, pp. 779–788.
[14] M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, “Design space exploration of FPGA-based deep convolutional neural networks,” in 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 2016, pp. 575–580.
[15] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning,” ACM
SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, 2014.
[16] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al.,
“Dadiannao: a machine-learning supercomputer,” in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014, pp. 609–622.
[17] A. Carbon, J.-M. Philippe, O. Bichler, R. Schmit, B. Tain, D. Briand, N. Ventroux, M. Paindavoine, and O. Brousse, “Pneuro: a scalable energy-efficient programmable hardware accelerator for neural networks,” in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp. 1039–1044.
[18] Y. Wang, H. Li, and X. Li, “A case of on-chip memory subsystem design for lowpower CNN accelerators,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 37, no. 10, pp. 1971–1984, 2017.
[19] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: ineffectual-neuron-free deep neural network computing,” ACM SIGARCH
Computer Architecture News, vol. 44, no. 3, pp. 1–13, 2016.
[20] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2016.
[21] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th Annual International Symposium on Computer
Architecture, 2017, pp. 1–12.
[22] F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, and S. Wei, “Deep convolutional neural network architecture with reconfigurable computation patterns,” IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 8, pp. 2220–2233, 2017.
[23] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015, pp.
161–170.
[24] M. Shahshahani, P. Goswami, and D. Bhatia, “Memory optimization techniques for FPGA based CNN implementations,” in 2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS), 2018, pp. 1–6.
[25] Y.-J. Lin and T. S. Chang, “Data and hardware efficient design for convolutional neural
network,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 5, pp. 1642–1651, 2017.
[26] U. G. Xilinx Inc, Zynq-7000 All Programmable SoC.
[27] “IMAGENET,” http://image-net.org/, accessed: 2020-06-09.
[28] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[29] U. G. Xilinx Inc, MPSoC Technical Reference Manual.
[30] D. S. Xilinx Inc, Zynq-7000 All Programmable SoC Overview, Advance Product Specification.
[31] ——, Zynq UltraScale+ MPSoC Data Sheet.
[32] ARM, AMBA AXI and ACE Protocol Specification.
[33] P. G. Xilinx Inc, LogiCORE IP Block Memory Generator v8.2.
[34] ——, Integrated Logic Analyzer v6.2.
[35] K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang,
“Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37,
no. 1, pp. 35–47, 2017.
[36] Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo, “Automatic compilation of diverse CNNs onto high-performance FPGA accelerators,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018.
[37] X. Qu, Z. Huang, N. Mao, Y. Xu, G. Cai, and Z. Fang, “A grain-adaptive computing structure for FPGA CNN acceleration,” in 2019 IEEE 13th International Conference
on ASIC (ASICON), 2019, pp. 1–4. |