基於多面體結構與迭代更新機制的Q學習演算法;Polytope-Based Correlated Q-learning with Iterative Strategy Updates

NCU Institutional Repository > 理學院 > 統計研究所 > 博碩士論文 > Item 987654321/98028

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98028

題名:	基於多面體結構與迭代更新機制的Q學習演算法;Polytope-Based Correlated Q-learning with Iterative Strategy Updates
作者:	高永瀚;Kao, Yung-Han
貢獻者:	統計研究所
關鍵詞:	相關均衡;多智能體強化學習;納許均衡;粒子群優化;Correlated equilibrium;Multi-agent Reinforcement Learning;Nash equilibrium;Particle Swarm Optimization
日期:	2025-07-10
上傳時間:	2025-10-17 12:16:02 (UTC+8)
出版者:	國立中央大學
摘要:	多智能體強化學習（Multi-Agent Reinforcement Learning, MARL）在動態與不確定環境中的決策問題中具有廣泛的應用。然而，由於多個智能體的交互影響，其學習過程比單智能體強化學習更加複雜，尤其在均衡策略的求解上。本研究針對MARL中的均衡問題，提出了一種創新的學習方法，結合幾何結構訊息，並透過粒子群優化（PSO）與單純形法（Simplex）進行策略更新，以提高均衡策略的計算效率與學習穩定性。首先，我們分析了隨機博弈中的相關均衡多面體結構，並透過頂點識別方法提供幾何詮釋。其次，我們設計了一種新的Q更新策略，以改善MARL的學習穩定性與收斂性。在實驗部分，我們透過三組模擬，涵蓋狀態數與動作數不同的二人博弈設定，藉由逐步提升環境複雜度，以評估方法的穩健性與策略估計準確性。結果顯示，所提方法在複雜情境下仍具良好學習表現。此外，我們亦補充一組貼近實務的模擬，用以初步檢視該方法於應用面向的可行性與後續改進方向。;Multi-Agent Reinforcement Learning (MARL) has been widely applied to decision-making problems in dynamic and uncertain environments. However, due to interactions among multiple agents, the learning process is more complex than in single-agent settings, particularly in solving equilibrium strategies. This study proposes an innovative learning method that incorporates geometric structure information and updates strategies using both Particle Swarm Optimization (PSO) and the Simplex method. The proposed approach aims to improve the computational efficiency and stability of equilibrium strategy learning in MARL settings. We first analyze the structure of correlated equilibrium in stochastic games and offer geometric insight through vertex identification. We then introduce a new Q-update mechanism to improve the learning stability of MARL. In experiments, we evaluate the method through three two-player game scenarios with varying state and action spaces, showing that the proposed approach remains robust under increasing complexity. Additionally, a realistic simulation is included to examine the method′s practical applicability and inform future improvements.
顯示於類別:	[統計研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	89	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....