摘要(英) |
In recent years, due to the development of artificial intelligence technology and the increasing popularity of mobile computing, accelerators that support neural network computing functions on edge devices have gradually become one of the options. This thesis presents hardware design on deep reinforcement learning and priority experience replay in a 12×12 map environment to solve the problem of map exploration. We build a deep Q network with multilayer perceptron, which has 144 nodes in the input layer, 72 nodes in hidden layer and 4 nodes in output layer. The hardware architecture consists of an evaluation network calculation unit, a target network calculation unit, a Q-value comparator, an action selector, and a temporal difference error operation unit. Every network calculation unit is composed of 18 network calculation slices. There are 4 adders and 4 multipliers that support various calculation modes in one Processing Elements (PEs) of the network calculation slice. This hardware architecture can compute forward propagation and back propagation, and update network parameters. In order to reduce the hardware area, we use customized block floating point format. The customized block floating point format consists of 1 sign bit, 7-bit exponent, 24-bit mantissa. An appropriate exponent is set and adjusted for each operation to reserve the required precision. It takes 155 clocks to complete the forward propagation, and 322 clocks for one back propagation. The design can be operated with a clock period of 11ns to generate the correct Q value and priority. According to the report of the resource utilization, 150987 LUTs, 72 BRAMs. 290 DSPs are used. The measurement result demonstrates that the reinforcement learning hardware design can fully realize the inferencing and training. In addition, we synthesis the multiply accumulate (MAC) with Design Compiler. Compared to the customized floating-point MAC, the synthesis results show that the customized block floating-point MAC can save up to 23.3% in area compared with the floating-point MAC at operating frequency 400MHz. |
參考文獻 |
[1] Y. Kim, D. Shin, J. Lee, Y. Lee and H. -J. Yoo, "A 0.55 V 1.1 mW Artificial Intelligence Processor With On-Chip PVT Compensation for Autonomous Mobile Robots," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 2, pp. 567-580, Feb. 2018.
[2] A. Amravati, S. B. Nasir, S. Thangadurai, I. Yoon, A. Raychowdury, “A 55nm Time-domain mixed-signal neuromorphic accelerator with stochastic synapses and embedded reinforcement learning for autonomous micro-robots,” 2018 IEEE International Solid - State Circuits Conference - (ISSCC), pp. 124-126.
[3] V.Mnih, K.Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” in NIPS Deep Learning Workshop, 2013.
[4] T. Schaul, J. Quan, I. Antonoglou and D. Silver, “Prioritized Experience Replay,” in International Conference on Learning Representations, 2016.
[5] 蘇俊達(110)。,”應用於地圖探索之深度強化學習演算法與硬體架構設計。未出版 碩士,國立中央大學電機工程學系,桃園市。”
[6] D. Elam, C. lovescu, “A Block Floating Point Implementation for an N-Point FFT on the TMS320C55x DSP”, Texas Instruments Application Report, SPRA948, Sep 2003
[7] S. Shao, J. Tsai, M. Mysior, W. Luk, T. Chau, A. Warren, and B. Jeppesen, “Towards hardware accelerated reinforcement learning for application specific robotic control,” in International Conference on Application specific Systems, Architectures and Processors, pp. 1–8, IEEE, 2018.
[8] J. Su, J. Liu, D. B. Thomas, and P. Y. Cheung, “Neural network based reinforcement learning acceleration on FPGA platforms,” ACM SIGARCH Computer Architecture News, vol. 44, no. 4, pp. 68–73, 2017.
[9] S. Shao and W. Luk, “Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation,” in International Conference on Field Programmable Logic and Applications, pp. 1–6, IEEE, 2017.
[10] C. Guo, W. Luk, S. Q. S. Loh, A. Warren and J. Levine, "Customisable control policy learning for robotics", Proc. IEEE 30th Int. Conf. Appl.-Specific Syst. Archit. Processors (ASAP), vol. 2160, pp. 91-98, Jul. 2019.
[11] H. Cho, P. Oh, J. Park, W. Jung and J. Lee, "FA3C: FPGA-accelerated deep reinforcement learning", Proc. 24th Int. Conf. Architectural Support Program. Lang. Operating Syst., pp. 499-513, 2019.
[12] S. Shao and W. Luk, "Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation", Proc. 27th Int. Conf. Field Program. Log. Appl. (FPL), pp. 1-6, Sep. 2017.
[13] G. Dinelli, G. Meoni, E. Rapuano and L. Fanucci, "Advantages and Limitations of Fully on-Chip CNN FPGA-Based Hardware Accelerator," 2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020, pp. 1-5, doi: 10.1109/ISCAS45731.2020.9180867.
[14] C. -B. Wu, C. -S. Wang and Y. -K. Hsiao, "Reconfigurable Hardware Architecture Design and Implementation for AI Deep Learning Accelerator," 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), 2020, pp. 154-155, doi: 10.1109/GCCE50665.2020.9291854. |