摘要(英) |
The present study employees algorithms of Double Deep Q Network ( Double DQN) and Q-Learning for training self-driving car agents in driving and parking modes, with the input features form data of the car (e.g., radar, car position, speed, etc.), and the estimation of Q value for each action as the output.Under different modes, the state spaces would be quite different from each other; hence, in the present study, it aims to adopt two certain situations, i.e., the driving mode as well as the parking mode for investigation.
Trained by Double DQN, the self-driving mode got the best result with about 9000 episodes. Meanwhile, in the parking situation, Double DQN was applied at first training the car to drive from the entrance of the parking lot into the parking space, but the performance was poor. Therefore, the car agents could use muti-mode training for the self-parking situation: first, use self-driving mode (with Double DQN) from the entrance of the parking lot to the position near the parking space, and then the car was trained to park into the parking space with a self-parking mode by Q-Learning. Accordingly, for searching the parking-space situation, the best result was achieved with about 9800 episodes with Double DQN. Then the car was trained to park into the parking space with Q-Learning, with the best of 3500-episode training. |
參考文獻 |
[1] M. Bojarski, D. Del Testa, D. Dworakowski, et al., End to end learning for self-driving
cars, Apr. 25, 2016. arXiv: 1604.07316[cs]. [Online]. Available: http://arxiv.org/
abs/1604.07316 (visited on 12/31/2022).
[2] C. Badue, R. Guidolini, R. V. Carneiro, et al., Self-driving cars: A survey, Oct. 2, 2019.
arXiv: 1901.04407[cs]. [Online]. Available: http://arxiv.org/abs/1901.04407 (visited
on 01/01/2023).
[3] L. Fridman, J. Terwilliger, and B. Jenik, “Deeptraffic: Crowdsourced hyperparameter tuning
of deep reinforcement learning systems for multi-agent dense traffic navigation,” arXiv
preprint arXiv:1801.02805, 2018.
[4] Samuel Arzt, AI learns to park - deep reinforcement learning, Aug. 23, 2019. [Online].
Available: https://www.youtube.com/watch?v=VMp6pq6_QjI (visited on 10/18/2022).
[5] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3, pp. 279–292,
1992.
[6] Cheesy AI, I teach AI how to drive a car with reinforcement learning, Nov. 6, 2019. [Online].
Available: https://www.youtube.com/watch?v=pT2Yzr1RqBo (visited on 10/18/2022).
[7] V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement
learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 26, 2015, issn: 0028-0836,
1476-4687. doi: 10.1038/nature14236. [Online]. Available: http://www.nature.com/
articles/nature14236 (visited on 10/18/2022).
[8] “Train a deep q network with TF-agents | TensorFlow agents.” (), [Online]. Available:
https : / / www . tensorflow . org / agents / tutorials / 1 _ dqn _ tutorial (visited on
10/21/2022).
[9] V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Playing atari with deep reinforcement learning,”
arXiv preprint arXiv:1312.5602, 2013.
[10] H. Hasselt, “Double q-learning,” Advances in neural information processing systems,
vol. 23, 2010.
[11] H. v. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double qlearning,”
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1,
Mar. 2, 2016, Number: 1, issn: 2374-3468. doi: 10.1609/aaai.v30i1.10295. [Online].
Available: https://ojs.aaai.org/index.php/AAAI/article/view/10295 (visited on
10/18/2022).
[12] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,”
Journal of Artificial Intelligence Research, vol. 4, pp. 237–285, May 1, 1996, issn: 1076-
9757. doi: 10.1613/jair.301. [Online]. Available: https://www.jair.org/index.php/
jair/article/view/10166 (visited on 10/18/2022).
[13] R. S. Sutton and A. G. Barto, Reinforcement Learning, second edition: An Introduction.
MIT Press, Nov. 13, 2018, 549 pp., Google-Books-ID: uWV0DwAAQBAJ, isbn: 978-0-
262-35270-3.
[14] R. Bellman, “A markovian decision process,” Indiana Univ. Math. J., vol. 6, pp. 679–684,
4 1957, issn: 0022-2518.
[15] R. Bellman, Dynamic Programming, 1st ed. Princeton, NJ, USA: Princeton University
Press, 1957.
[16] M. Lapan, Deep Reinforcement Learning Hands-On: Apply modern RL methods, with
deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt
Publishing Ltd, Jun. 21, 2018, 547 pp., Google-Books-ID: xKdhDwAAQBAJ, isbn: 978-
1-78883-930-3.
[17] “Value iteration —introduction to reinforcement learning.” (), [Online]. Available: https:
//gibberblot.github.io/rl-notes/single-agent/value-iteration.html (visited
on 01/06/2023).
[18] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989.
24
|