摘要: | 人體姿勢估計應用在多個場景,包含自動駕駛、交通監控、患者監控、動作辨識、跌倒偵測等,有助於預防性。 與傳統的感測器相比,毫米波雷達更適用於不同的環境,因為 使用者 不 需要 佩戴著裝置,且在 低光源或是天氣惡劣的情況下, 傳統感測器的性能會下降。毫米波雷達不會捕捉到使用者的面部,因此更具有隱私權保護及安全。毫米波雷達相較於光達,成本較低也更好取得。 本論文使用毫米波雷達生成三維點雲(x, y, z),並基於序列至序列 (Sequence to sequence)模型估計人體姿勢。首先會經由體素化預處理 點雲數據,並將10幀體素數據累加輸入到系統架構中,預測出 25個骨架關節點的體素索引,最後體素索引會根據體素化過程中使用的體素字典轉換回真實的三維世界座標。預測出來的結果再和 Ground Truth使用平均絕對誤差(MAE)做比較 ,目標是最小化誤差值 。本實驗在編碼器加入了 self-attention相較於基準 (Baseline),準確度提升了 5%,參數量減少 10M。;Human pose estimation is applicable in various scenarios, including autonomous driving, traffic monitoring, patient monitoring, action recognition, and fall detection, contributing to preventive measures. Compared to traditional sensors, mmWave radar is more suitable for different environments as users do not need to wear devices, and its performance is less affected by low lighting or adverse weather conditions. Additionally, mmWave radar does not capture users′ facial features, providing privacy protection and security. It is also more cost-effective and accessible compared to lidar sensors. In this paper, we utilize millimeter-wave radar to generate three-dimensional point clouds (x, y, z) and estimate human poses using a sequence-to-sequence model. Initially, the point cloud data is preprocessed through voxelization, and a sliding time window accumulates 10 frames of voxelized data as input to the system architecture. The model predicts voxel indices for 25 skeletal joints. Finally, the voxel indices are converted back to real-world 3D coordinates using the voxel dictionary employed during voxelization. The predicted results are compared with the ground truth using the Mean Absolute Error (MAE) metric, aiming to minimize the error. In the experiment, we introduce self-attention in the encoder, resulting in a 5% improvement in accuracy compared to the baseline, while reducing the parameter count by 10M. |