dc.description.abstract | Human pose estimation is applicable in various scenarios, including autonomous driving, traffic monitoring, patient monitoring, action recognition, and fall detection, contributing to preventive measures. Compared to traditional sensors, mmWave radar is more suitable for different environments as users do not need to wear devices, and its performance is less affected by low lighting or adverse weather conditions. Additionally, mmWave radar does not capture users′ facial features, providing privacy protection and security. It is also more cost-effective and accessible compared to lidar sensors.
In this paper, we utilize millimeter-wave radar to generate three-dimensional point clouds (x, y, z) and estimate human poses using a sequence-to-sequence model. Initially, the point cloud data is preprocessed through voxelization, and a sliding time window accumulates 10 frames of voxelized data as input to the system architecture. The model predicts voxel indices for 25 skeletal joints. Finally, the voxel indices are converted back to real-world 3D coordinates using the voxel dictionary employed during voxelization. The predicted results are compared with the ground truth using the Mean Absolute Error (MAE) metric, aiming to minimize the error. In the experiment, we introduce self-attention in the encoder, resulting in a 5% improvement in accuracy compared to the baseline, while reducing the parameter count by 10M. | en_US |