中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98028
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 83696/83696 (100%)
造訪人次 : 56346879      線上人數 : 1709
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98028


    題名: 基於多面體結構與迭代更新機制的Q學習演算法;Polytope-Based Correlated Q-learning with Iterative Strategy Updates
    作者: 高永瀚;Kao, Yung-Han
    貢獻者: 統計研究所
    關鍵詞: 相關均衡;多智能體強化學習;納許均衡;粒子群優化;Correlated equilibrium;Multi-agent Reinforcement Learning;Nash equilibrium;Particle Swarm Optimization
    日期: 2025-07-10
    上傳時間: 2025-10-17 12:16:02 (UTC+8)
    出版者: 國立中央大學
    摘要: 多智能體強化學習(Multi-Agent Reinforcement Learning, MARL)在動態與不確定環境中的決策問題中具有廣泛的應用。然而,由於多個智能
    體的交互影響,其學習過程比單智能體強化學習更加複雜,尤其在均衡策略的求解上。本研究針對MARL中的均衡問題,提出了一種創新的學習方法,結合幾何結構訊息,並透過粒子群優化(PSO)與單純形法(Simplex)進行策略更新,以提高均衡策略的計算效率與學習穩定性。首先,我們分析了隨機博弈中的相關均衡多面體結構,並透過頂點識別方法提供幾何詮釋。其次,我們設計了一種新的Q更新策略,以改善MARL的學習穩定性與收斂性。在實驗部分,我們透過三組模擬,涵蓋狀態數與動作數不同的二人博弈設定,藉由逐步提升環境複雜度,以評估方法的穩健性與策略估計準確性。結果顯示,所提方法在複雜情境下仍具良好學習表現。此外,我們亦補充一組貼近實務的模擬,用以初步檢視該方法於應用面向的可行性與後續改進方向。;Multi-Agent Reinforcement Learning (MARL) has been widely applied to decision-making problems in dynamic and uncertain environments. However, due to interactions among multiple agents, the learning process is more complex than in single-agent settings, particularly in solving equilibrium strategies. This study proposes an innovative learning method that incorporates geometric structure information and updates strategies using both Particle Swarm Optimization (PSO) and the Simplex method. The proposed approach aims to improve the computational efficiency and stability of equilibrium strategy learning in MARL settings. We first analyze the structure of correlated equilibrium in stochastic games and offer geometric insight through vertex identification. We then introduce a new Q-update mechanism to improve the learning stability of MARL. In experiments, we evaluate the method through three two-player game scenarios with varying state and action spaces, showing that the proposed approach remains robust under increasing complexity. Additionally, a realistic simulation is included to examine the method′s practical applicability and inform future improvements.
    顯示於類別:[統計研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML6檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明