參考文獻 |
[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, pp. 2278–2324, 1998.
[2] Y. S. Shao, J. Clemons, R. Venkatesan, B. Zimmer, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina, S. G. Tell, Y. Zhang, W. J. Dally, J. Emer, C. T.
Gray, B. Khailany, and S. W. Keckler, “Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture,” in Proceedings of IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019, pp. 14–27.
[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp.
770–778.
[4] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” CoRR, 2017.
[5] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
[6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in International Conference on Learning Representations (ICLR),
2015, pp. 1–14.
[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
[8] A. Oord, S. Dieleman, and B. Schrauwen, “Deep Content-Based Music Recommendation,” in Proceedings of International Conference on Neural Information Processing Systems (NIPS), 2013, pp. 2643–2651.
[9] A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A Generative Model for Raw Audio,” in Proceedings of ISCA Workshop on Speech Synthesis Workshop (SSW), 2016, p. 125.
[10] A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis, “Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks,” in IEEE Conference on Business Informatics (CBI), 2017, pp. 7–12.
[11] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), pp. 211–252, 2015.
[12] K. Yamaguchi, K. Sakamoto, T. Akabane, and Y. Fujimoto, “A Neural Network for Speaker-Independent Isolated Word Cecognition,” in Proceedings of International Conference on Spoken Language Processing (ICSLP), 1990, pp. 1077–1080.
[13] D. Ciresan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-Column Deep Neural Network for Traffic Sign Classification,” Neural Networks, pp. 333–338, 2012.
[14] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing,” in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2016, pp. 1–13.
[15] Y. H. Chen, J. Emer, and V. Sze, “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,” in ACM/IEEE International Symposium
on Computer Architecture (ISCA), 2016, pp. 367–379.
[16] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “ShiDianNao: Shifting Vision Processing Closer to the Sensor,” in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2015, pp. 92–104.
[17] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016, pp. 243–254.
[18] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “SCNN: An Accelerator for Compressed-Sparse Convolutional Neural Networks,” in ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 27–40.
[19] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-X: An Accelerator for Sparse Neural Networks,” in Proceedings of IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, pp. 1–12.
[20] N. Beck, S. White, M. Paraschou, and S. Naffziger, “Zeppelin: An SoC for Multichip Architectures,” in IEEE International Solid-State Circuits Conference (ISSCC), 2018,
pp. 40–42.
[21] R. Hwang, T. Kim, Y. Kwon, and M. Rhu, “Centaur: A Chiplet-Based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations,” in ACM/IEEE International
Symposium on Computer Architecture (ISCA), 2020, pp. 968–981.
[22] M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in Proceedings of International
Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017, pp. 751–764.
[23] A. Parashar, P. Raina, Y. S. Shao, Y.-H. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer, “Timeloop: A Systematic Approach to DNN Accelerator Evaluation,” in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019, pp. 304–315.
[24] H. Kwon, A. Samajdar, and T. Krishna, “MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects,” in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018, pp. 461–475.
[25] M. Shahshahani, P. Goswami, and D. Bhatia, “Memory Optimization Techniques for FPGA-Based CNN Implementations,” in IEEE Dallas Circuits and Systems Conference (DCAS), 2018, pp. 1–6.
[26] Y. J. Lin and T. S. Chang, “Data and Hardware Efficient Design for Convolutional Neural Network,” IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), pp. 1642–1651, 2018.
[27] D. A. Patterson, “Latency lags bandwith,” Communications of the ACM (Commun. ACM), p. 71–75, 2004.
[28] C. Stapper, “Defect Density Distribution for LSI Yield Calculations,” IEEE Transactions on Electron Devices (T-ED), pp. 655–657, 1973.
[29] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks,” in Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2015, p. 161–170.
[30] F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, and S. Wei, “Deep convolutional neural network architecture with reconfigurable computation patterns,” IEEE Transactions on Very Large Scale Integration Systems (T-VLSIS), pp. 2220–2233, 2017.
[31] L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong, “Polyhedral-based data reuse optimization for configurable computing,” in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), 2013, p. 29–38.
[32] D. Stow, Y. Xie, T. Siddiqua, and G. H. Loh, “Cost-Effective Design of Scalable High-Performance Systems Using Active and Passive Interposers,” in IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017, pp. 728–735.
[33] J. A. Cunningham, “The Use and Evaluation of Yield Models in Integrated Circuit Manufacturing,” IEEE Transactions on Semiconductor Manufacturing (T-SM), pp. 60–
71, 1990.
[34] M.-S. Lin, C.-C. Tsai, C.-H. Hsieh, W.-H. Huang, Y.-C. Chen, S.-C. Yang, C.-M. Fu, H.-J. Zhan, J.-Y. Chien, S.-Y. Li, Y.-H. Chen, C.-C. Kuo, S.-P. Tai, and K. Yamada,
“A 16nm 256-bit wide 89.6gbyte/s total bandwidth in-package interconnect with 0.3v swing and 0.062pj/bit power in info package,” in IEEE Hot Chips Symposium (HCS), 2016, pp. 1–32.
[35] A. Shokrollahi, D. Carnelli, J. Fox, K. Hofstra, B. Holden, A. Hormati, P. Hunt, M. Johnston, J. Keay, S. Pesenti, R. Simpson, D. Stauffer, A. Stewart, G. Surace,
A. Tajalli, O. T. Amiri, A. Tschank, R. Ulrich, C. Walter, F. Licciardello, Y. Mogentale, and A. Singh, “A pin-efficient 20.83gb/s/wire 0.94pj/bit forwarded clock cnrz-5-coded
serdes up to 12mm for mcm packages in 28nm cmos,” in IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 182–183.
[36] J. M. Wilson, W. J. Turner, J. W. Poulton, B. Zimmer, X. Chen, S. S. Kudva, S. Song, S. G. Tell, N. Nedovic, W. Zhao, S. R. Sudhakaran, C. T. Gray, and W. J. Dally, “A
1.17pj/b 25gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm cmos using a process- and temperature-adaptive voltage regulator,” in IEEE International Solid-State Circuits Conference (ISSCC), 2018, pp. 276–278.
[37] “Imagenet,”
https://www.imagenet.org/challenges/LSVRC/index.php.
[38] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2012, pp. 1097–1105.
[39] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in Proceedings of International Conference on International Conference on Machine Learning (ICML), 2015, pp. 448–456.
[40] “Zynq-7000 SoC Technical Reference Manual,”
https://docs.xilinx.com/v/u/en-
US/ug585-Zynq-7000-TRM.
[41] “Zynq UltraScale+ Device Technical Reference Manual,”
https://docs.xilinx.com/r/en-US/ug1085-zynq-ultrascale-trm.
[42] “Zynq-7000 SoC Data Sheet: Overview,”
https://docs.xilinx.com/v/u/en-US/ds190-Zynq-7000-Overview.
[43] “Zynq UltraScale+ MPSoC Data Sheet: Overview,”
https://docs.xilinx.com/v/u/en-US/ds891-zynq-ultrascale-plus-overview. |