在多代理路徑尋找(MAPF)任務中,於包含 32 個代理、其中 37.5% 為惡意代理,以及 15% 靜態障礙物密度的條件下,我們的框架相較於缺乏安全機制的基準模型 SCRIMP,碰撞率降低了 46.22%。同時,我們的方法仍維持 93% 的成功率,這展現了其在充滿挑戰的物聯網環境中的有效性與可靠性。;The proliferation of the Internet of Things (IoT) drives applications requiring multi-agent collaboration in dynamic environments, where ensuring performance, security, and reliability is a significant challenge. Multi-Agent Reinforcement Learning (MARL) is essential for optimizing decision-making in such settings, but it is vulnerable to strategic or malicious behaviors that can undermine trust and degrade performance.
This work proposes a Blockchain-enabled Centralized Training and Decentralized Execution (BE-CTDE) framework tailored for IoT. The framework uses blockchain as a trusted training management layer to enhance decision transparency and collaborative efficiency. We introduce on-chain residual-based malicious behavior detection to enhance MARL stability and system fault tolerance by filtering malicious agents during training. Furthermore, our method designs an incentive mechanism based on a token-based economy, combining punishment and compensation to promote honest participation and enhance agent learning performance. A private Ethereum environment was established to implement data submission, malicious behavior detection, and reward allocation for real-world deployment.
In a multi-agent pathfinding (MAPF) task, our framework reduces the collision rate by 46.22% compared to the baseline SCRIMP, which lacks security mechanisms, under conditions with 32 agents, a 37.5% malicious agent ratio, and 15% static obstacle density. At the same time, our method maintains a 93% success rate, showcasing its effectiveness and reliability in challenging IoT environments.