實驗結果顯示,在具挑戰性的 EuRoC 資料集上,VKFus 達到了 0.57 公尺的平均均方根誤差 (Root Mean Square Error, RMSE),相較於近期基於 EKF 的方法,性能提升高達 41%。在 KITTI 資料集上,我們的方法在定位精度上展現了 40% 的提升。這些一致的性能提升,驗證了 VKFus 作為一個可靠導航系統解決方案的有效性。;This paper presents VKFus, a flexible framework that unifies learned absolute and relative pose fusion through variational inference for positioning under challenging conditions. Our approach establishes a mathematically rigorous foundation by demonstrating the equivalence between Extended Kalman Filter (EKF) optimization and Evidence Lower Bound (ELBO) maximization under Gaussian assumptions. This formulation provides a theoretical basis for sensor fusion, distinguishing the approach from empirical methods. The VKFus framework demonstrates flexibility by accommodating diverse sensor configurations. The system employs an Absolute Pose Regression (APR) branch for drift-free global positioning and a Relative Pose Regression (RPR) branch that processes multi-modal sensor data including IMU measurements for motion estimates. Both branches incorporate attention-based uncertainty estimation modules that simultaneously predict camera poses and uncertainties, enabling adaptive weighting between modalities based on environmental conditions. Experimental results demonstrate state-of-the-art performance. On the challenging EuRoC dataset, VKFus achieves an average RMSE of 0.57m, an improvement of up to 41% over recent EKF-based methods. On the KITTI dataset, our approach demonstrates a 40% improvement in positioning accuracy. The consistent performance improvements validate VKFus′s effectiveness as a reliable solution for navigation systems.