參考文獻 |
[1] David Silver, Thomas Hubert, Julian Schrittwieser, ”Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” in Science, 2018.
[2] Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van, ”StarCraft II: A New Challenge for Reinforcement Learning,” in arXiv, 2017.
[3] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, ”Playing Atari with Deep Reinforcement Learning,” in Neural Information Processing Systems, 2013.
[4] Hado van Hasselt, Arthur Guez, David Silver, ”Deep Reinforcement Learning with Double Q-learning,” in Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 2016.
[5] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas, ”Dueling Network Architectures for Deep Reinforcement Learning,” in The 33rd International Conference on Machine Learning, 2016.
[6] Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, ”Prioritized Experience Replay,” in International Conference on Learning Representations, 2016.
[7] Kristopher De Asis, 1 J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton, ”Multi-Step Reinforcement Learning: A Unifying Algorithm,” in Thirty-Second AAAI Conference on Artificial Intelligence , 2018.
[8] Matteo Hessel, Joseph Modayil,Hado van Hasselt, ”Rainbow: Combining Improvements in Deep Reinforcement Learning,” in Association for the Advancement of Artificial Intelligence 2018, 2017.
[9] Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour, ”Policy Gradient Methods for Reinforcement Learning with Function Approximation,” in the 12th International Conference on Neural Information Processing Systems, 1999.
[10] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel, ”Trust Region Policy Optimization,” in International conference on machine learning, 2015.
[11] Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver, ”Emergence of Locomotion Behaviours in Rich Environments,” in arXiv, 2017.
[12] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, ”Proximal Policy Optimization Algorithms,” in arXiv, 2017.
[13] Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, ”Asynchronous Methods for Deep Reinforcement Learning,” in International Conference on Machine Learning, 2016.
[14] Warwick Masson, Pravesh Ranchod, George Konidaris, ”Reinforcement learning with parameterized actions,” in the Thirtieth of Association for the Advancement of Artificial Intelligence, 2016.
[15] Matthew Hausknecht, Mupparaju, Sandeep Subramanian, Shivaram Kalyanakrishnan, and Peter Stone, ”Half field offense: An environment for multiagent learning and ad hoc teamwork,” in AAMAS Adaptive Learning Agents (ALA) Workshop, 2016.
[16] Matthew Hausknecht, Peter Stone, ”Deep reinforcement learning in parameterized,” in the International Conference on Learning Representations, 2016.
[17] Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu, ”Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space,” in CoRR, abs/1810.06394, 2018.
[18] Ermo Wei, Drew Wicke, Sean Luke, ”Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space,” in AAAI Fall Symposium on Data Efficient Reinforcement Learning , 2018.
[19] Zhou Fan, Rui Su,Weinan Zhang,Yong Yu, ”Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space,” in International Joint Conferences on Artificial Intelligence 2019, 2019.
[20] Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross, ”Efficient Entropy for Policy Gradient with Multidimensional Action Space,” in International Conference on Learning Representations, 2018.
[21] M. Tan, ”Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Machine Learning Proceedings, 1993, p. 330.
[22] Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, ”Emergent complexity via multi-agent competition,” in The International Conference on Learning Representations, 2018.
[23] Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch, ”Emergent tool use from multi-agent autocurricula,” in The International Conference on Learning Representations 2020 Conference Blind, 2019.
[24] Jakob N. Foerster, Yannis M. Assael,Nando de Freitas,Shimon Whiteson, ”Learning to Communicate with Deep Multi-Agent Reinforcement Learning,” in Advances in Neural Information Processing Systems, 2016.
[25] Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, ”Grandmaster level in StarCraft II using multi-agent reinforcement learning,” in Nature, 2019.
[26] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel, ”Value-decomposition networks for cooperative multi-agent learning based on team reward,” in the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018.
[27] Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala, ”Episodic Exploration for Deep Deterministic Policies for StarCraft Micro-Management,” in International Conference on Learning, 2017.
[28] Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson, ”Counterfactual Multi-Agent Policy Gradients,” in the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[29] Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson, ”QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning,” in the 35th International Conference on Machine Learning, 2018.
[30] David Ha, Andrew Dai, Quoc V. Le, ”HyperNetworks,” in the International Conference on Learning Representations, 2017.
[31] M. L. Littman, ”Markov games as a framework for multi-agent reinforcement learning,” in the eleventh international conference on machine learning, 1994.
[32] Sham Kakade, John Langford, ”Approximately optimal approximate reinforcement learning,” in the Nineteenth International Conference on Machine Learning, 2002.
[33] Alex Krizhevsky, Ilya Sutskever,Geoffrey E. Hinton, ”ImageNet Classification with Deep Convolutional,” in Advances in neural information processing systems, 2012.
|