A Novel Multi-Task-Agents Reinforcement Learning with Multi-Dimensional Action Space

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：18.222.215.20

姓名

李沅紘(Yuan Hung Lee) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(A Novel Multi-Task-Agents Reinforcement Learning with Multi-Dimensional Action Space)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

於硬體大幅的進步，使得強化學習(Reinforcement learning)可以實作並在AI領域帶來新的革新技術，也成為受人歡迎的領域。他可以應用在難以預測的環境上有良好的效果。先前關於強化學習的研究大部分專注在一個玩家在一維度且小範圍動作空間環境，但是多玩家可以處理空間更大互動更多的環境。強化學習其中一項有挑戰性的困難點在多任務的互動行為。本論文中我們提出了一種新型的model，模型透過多個玩家的合作在多任務和多維度動作空間來得取高分。此外我們提出可行的方式來分解巨大的動作空間及減少硬體內存需求和減少計算時間來提高效果。在StartCraft2平台上進行了性能評估，以證明迷你遊戲的有效性。實驗結果表明，所提出的方法在所有指標上均明顯優於比較模型。

摘要(英)

Owing to the great advance of hardware techniques, RL (Reinforcement learning) can implement and become popular technique to interact with unpredictable environment. Most prior studies of successive RL model only focus on interaction between single agent and environment with single task and small action space. However, the multi-agent can solve more problem. One of RL challenge task is interaction in multi task environment. In this paper, we propose a novel RL model to interact with the multi task and multi dimension action space environment well through cooperation of agents. In addition, we propose a feasible way to decompose action space to reduce memory size and calculation, and improve the effect. The performance evaluations are conducted on the StartCraft2 platform to demonstrate the effectiveness of mini-games. The experimental results show that the proposed methods significantly outperform the state-of-the-art models in all metrics.

關鍵字(中)

★ 機械學習
★ 強化學習
★ 多維度動作空間
★ 多玩家
★ 星海爭霸2

關鍵字(英)

★ Machine learning
★ Reinforcement learning
★ multi dimension action space
★ multi agent
★ StarCraft2

論文目次

摘要 i
Abstract ii
誌謝 iii
List of tables v
List of figures vi
Symbol vii
1. Introduction 1
2. Related work 4
2.1 RL in single dimension action space 4
2.2 RL in multi-dimensional action space 5
2.3 RL in Multi agent 6
3. Preliminary 7
3.1 Notation 7
3.2 Problem Definition 7
4. Proposed RL: R2-PPO 8
4.1 Feature extraction and Reward function 8
4.2 slave training model 9
4.2.1 Action space decomposition 9
4.2.2 Proximal Policy Optimization (PPO) 11
4.2.3 Multi-dimensional action space PPO 12
4.3 Master training model 13
5. Performance Evaluation 16
5.1 Experiment Setup 16
5.2 R2-PPO Performance 19
5.3 The Effectiveness of R2-PPO training episodes 19
6. Conclusion 30
Reference 31

參考文獻

[1] David Silver, Thomas Hubert, Julian Schrittwieser, ”Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” in Science, 2018.
[2] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van, ”StarCraft II: A New Challenge for Reinforcement Learning,” in arXiv, 2017.
[3] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, ”Playing Atari with Deep Reinforcement Learning,” in Neural Information Processing Systems, 2013.
[4] Hado van Hasselt, Arthur Guez, David Silver, ”Deep Reinforcement Learning with Double Q-learning,” in Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 2016.
[5] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas, ”Dueling Network Architectures for Deep Reinforcement Learning,” in The 33rd International Conference on Machine Learning, 2016.
[6] Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, ”Prioritized Experience Replay,” in International Conference on Learning Representations, 2016.
[7] Kristopher De Asis, 1 J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton, ”Multi-Step Reinforcement Learning: A Unifying Algorithm,” in Thirty-Second AAAI Conference on Artificial Intelligence , 2018.
[8] Matteo Hessel, Joseph Modayil,Hado van Hasselt, ”Rainbow: Combining Improvements in Deep Reinforcement Learning,” in Association for the Advancement of Artificial Intelligence 2018, 2017.
[9] Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour, ”Policy Gradient Methods for Reinforcement Learning with Function Approximation,” in the 12th International Conference on Neural Information Processing Systems, 1999.
[10] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel, ”Trust Region Policy Optimization,” in International conference on machine learning, 2015.
[11] Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver, ”Emergence of Locomotion Behaviours in Rich Environments,” in arXiv, 2017.
[12] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, ”Proximal Policy Optimization Algorithms,” in arXiv, 2017.
[13] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, ”Asynchronous Methods for Deep Reinforcement Learning,” in International Conference on Machine Learning, 2016.
[14] Warwick Masson, Pravesh Ranchod, George Konidaris, ”Reinforcement learning with parameterized actions,” in the Thirtieth of Association for the Advancement of Artificial Intelligence, 2016.
[15] Matthew Hausknecht, Mupparaju, Sandeep Subramanian, Shivaram Kalyanakrishnan, and Peter Stone, ”Half field offense: An environment for multiagent learning and ad hoc teamwork,” in AAMAS Adaptive Learning Agents (ALA) Workshop, 2016.
[16] Matthew Hausknecht, Peter Stone, ”Deep reinforcement learning in parameterized,” in the International Conference on Learning Representations, 2016.
[17] Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu, ”Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space,” in CoRR, abs/1810.06394, 2018.
[18] Ermo Wei, Drew Wicke, Sean Luke, ”Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space,” in AAAI Fall Symposium on Data Efficient Reinforcement Learning , 2018.
[19] Zhou Fan, Rui Su,Weinan Zhang,Yong Yu, ”Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space,” in International Joint Conferences on Artificial Intelligence 2019, 2019.
[20] Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross, ”Efficient Entropy for Policy Gradient with Multidimensional Action Space,” in International Conference on Learning Representations, 2018.
[21] M. Tan, ”Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Machine Learning Proceedings, 1993, p. 330.
[22] Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, ”Emergent complexity via multi-agent competition,” in The International Conference on Learning Representations, 2018.
[23] Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch, ”Emergent tool use from multi-agent autocurricula,” in The International Conference on Learning Representations 2020 Conference Blind, 2019.
[24] Jakob N. Foerster, Yannis M. Assael,Nando de Freitas,Shimon Whiteson, ”Learning to Communicate with Deep Multi-Agent Reinforcement Learning,” in Advances in Neural Information Processing Systems, 2016.
[25] Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, ”Grandmaster level in StarCraft II using multi-agent reinforcement learning,” in Nature, 2019.
[26] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel, ”Value-decomposition networks for cooperative multi-agent learning based on team reward,” in the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018.
[27] Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala, ”Episodic Exploration for Deep Deterministic Policies for StarCraft Micro-Management,” in International Conference on Learning, 2017.
[28] Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson, ”Counterfactual Multi-Agent Policy Gradients,” in the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[29] Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson, ”QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning,” in the 35th International Conference on Machine Learning, 2018.
[30] David Ha, Andrew Dai, Quoc V. Le, ”HyperNetworks,” in the International Conference on Learning Representations, 2017.
[31] M. L. Littman, ”Markov games as a framework for multi-agent reinforcement learning,” in the eleventh international conference on machine learning, 1994.
[32] Sham Kakade, John Langford, ”Approximately optimal approximate reinforcement learning,” in the Nineteenth International Conference on Machine Learning, 2002.
[33] Alex Krizhevsky, Ilya Sutskever,Geoffrey E. Hinton, ”ImageNet Classification with Deep Convolutional,” in Advances in neural information processing systems, 2012.

指導教授

施國琛(Timothy K. Shih)

審核日期

2020-1-15

推文