摘要(英) |
Human pose estimation is applicable in various scenarios, including autonomous driving, traffic monitoring, patient monitoring, action recognition, and fall detection, contributing to preventive measures. Compared to traditional sensors, mmWave radar is more suitable for different environments as users do not need to wear devices, and its performance is less affected by low lighting or adverse weather conditions. Additionally, mmWave radar does not capture users′ facial features, providing privacy protection and security. It is also more cost-effective and accessible compared to lidar sensors.
In this paper, we utilize millimeter-wave radar to generate three-dimensional point clouds (x, y, z) and estimate human poses using a sequence-to-sequence model. Initially, the point cloud data is preprocessed through voxelization, and a sliding time window accumulates 10 frames of voxelized data as input to the system architecture. The model predicts voxel indices for 25 skeletal joints. Finally, the voxel indices are converted back to real-world 3D coordinates using the voxel dictionary employed during voxelization. The predicted results are compared with the ground truth using the Mean Absolute Error (MAE) metric, aiming to minimize the error. In the experiment, we introduce self-attention in the encoder, resulting in a 5% improvement in accuracy compared to the baseline, while reducing the parameter count by 10M. |
參考文獻 |
[1] TI毫米波雷達來源
https://www.ti.com/video/library.html
[2] Kinect v2深度相機來源
https://learn.microsoft.com/zh-cn/windows/apps/design/devices/kinect-for-windows
[3] R. Weimar et al., “Time-of-flight techniques for the investigation of kinetic energy distributions of ions and neutrals desorbed by core excitations”, Surface science, 2000
[4] J. Shotton et al., "Real-time human pose recognition in parts from single depth images", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1297-1304, 2011.
[5] A. D. Singh, S. S. Sandha, L. Garcia and M. Srivastava, "RadHAR: Human activity recognition from point clouds generated through a millimeter-wave radar", Proc. 3rd ACM Workshop Millimeter-Wave Netw. Sens. Syst., pp. 51-56, 2019.
[6] P. Zhao et al., "mID: Tracking and identifying people with millimeter wave radar", Proc. DCOSS, pp. 33-40, 2019.
[7] R. Zhang and S. Cao, "Real-time human motion behavior detection via CNN using mmWave radar", IEEE Sensors Lett., vol. 3, no. 2, pp. 1-4, Feb. 2019.
[8] A. Sengupta, F. Jin, R. Zhang and S. Cao, "mm-Pose: Real-time human skeletal posture estimation using mmWave radars and CNNs", IEEE Sensors J., vol. 20, no. 17, pp. 10032-10044, Sep. 2020.
[9] A. Sengupta and S. Cao, "mmPose-NLP: A natural language processing approach to precise skeletal pose estimation using mmWave radars", IEEE Trans. Neural Netw. Learn. Syst., Mar. 2022.
[10] A. Sherstinsky, "Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network", arXiv:1808.03314, 2018.
[11] J. Chung, C. Gulcehre, K. Cho and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling", arXiv:1412.3555, 2014.
[12] D. Bahdanau, K. Cho and Y. Bengio, "Neural machine translation by jointly learning to align and translate", International Conference on Learning Representations, 2015.
[13] I. Sutskever, O. Vinyals and Q. Le, "Sequence to sequence learning with neural networks", Proc. 27th Int. Conf. Neural Inf. Process. Syst., pp. 3104-3112, 2014.
[14] Y. Wu et al., "Google′s neural machine translation system: Bridging the gap between human and machine translation", arXiv:1609.08144, 2016.
[15] A. Vaswani et al., "Attention is all you need", Proc. Adv. Neural Inf. Process. Syst., pp. 5998-6008, 2017. |