參考文獻 |
[1] H. Sharma et al., "Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network," 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 764-775, June 2018.
[2] H. T. Kung, “Why systolic architecture?,” Design Research Center, pp. 37-46, 1982.
[3] N. P. Jouppi, et al., “In-datacenter performance analysis of a tensor processing unit,” Proceedings of the 44th annual international symposium on computer architecture., pp. 1–12, Jun, 2017.
[4] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609-622, December 2014.
[5] S. Zhang et al., "Cambricon-X: An accelerator for sparse neural networks," 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2016
[6] H. Xiao, H. Xu, X. Chen, Y. Wang, and Y. Han, "Fast and High-Accuracy Approximate MAC Unit Design for CNN Computing," IEEE Embedded Systems Letters, vol. 14, no. 3, pp. 155-158, September 2022.
[7] T. T. Hoang, M. Sjalander and P. Larsson-Edefors, "A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 12, pp. 3073-3081, December 2010.
[8] S. Rakesh and K. S. V. Grace, "A survey on the design and performance of various MAC unit architectures," 2017 IEEE International Conference on Circuits and Systems (ICCAS), pp. 312-315, December 2017
[9] Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor," 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 92-104, June 2015.
[10] C. S. Wallace, “A suggestion for a fast multiplier” IEEE Transactions on electronic Computers, no. 1, pp. 14-17, February 1964.
[11] L. Dadda, “Some schemes for parallel multipliers,” IEEE Computer Society Press, 1990.
[12] C. P. Narendra and K. R. Kumar, “Low power compressor based MAC architecture for DSP applications,” 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1-5., February 2015.
[13] A. Abdelgawad and M. Bayoumi, “High speed and area-efficient multiply accumulate (MAC) unit for digital signal prossing applications,” 2007 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 3199-3202, May 2007.
[14] T. U. Anusree and P. L. Bonifus, “Design and analysis of modified fast compressors for MAC unit,” International Journal of Computer Trends and Technology, vol. 36, pp. 231-218, June 2016.
[15] A. Vaswani et al., "Attention is all you need", Proc. 31st Int. Conf. Neural Inf. Process. Syst., pp. 6000-6010, June 2017.
[16] A. Riaz, and V. K. Sharma, “A novel low power 4: 2 compressor using FinFET devices,” Analog integrated circuits and signal processing, vol. 112, no. 1, pp. 127-139, January 2022
[17] A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, and G. Di Meo, “Comparison and extension of approximate 4-2 compressors for low-power approximate multipliers,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 9, pp. 3021-3034, May 2020.
[18] S. D. Pezaris, "A 40-ns 17-bit by 17-bit array multipliers," IEEE Transactions on Computers, vol. 20, pp. 442-447, April 1971.
[19] K.Z. Pekmestzi, "Multiplexer-based array multipliers," IEEE Transactions on Computers, vol. 48, no. 1,pp. 15-23, Jan. 1999.
[20] A. Booth, "A signed binary multiplication techniques," Quarterly Journal Mechanics of Applied Mathematics, vol. 4, pp. 236-240, 1951.
[21] L. MacSorley, "High speed arithmetic in binary computers," Proc. IRE, vol. 49, Jan. 1961.
[22] V. G. Oklobdzija and D. Villeger, "Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 3, no. 2, pp. 292-301, June 1995
[23] V. G. Oklobdzija, D. Villeger and S. S. Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," in IEEE Transactions on Computers, vol. 45, no. 3, pp. 294-306, March 1996.
[24] A. A. Fayed and M. A. Bayoumi, "A merged multiplier-accumulator for high speed signal processing applications," 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 2002, pp. III-3212-III-3215, May 2002. |