多重模式Q-Learning演算法代理人於無人自駕車之應用

、線上人數：10

、訪客IP：3.141.12.150

姓名	林彥誠(Yen-Cheng Lin) 查詢紙本館藏	畢業系所	數學系
論文名稱	多重模式Q-Learning演算法代理人於無人自駕車之應用 (Multi-Mode Agent for Q-Learning Algorithms in Self-Driving Car Application)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 (2028-2-3以後開放)
摘要(中)	本研究使用Double Deep Q Network( Double DQN) 及Q-Learning演算法，訓練無人自駕車的自動駕駛與自動停車模式。其中，自駕車的多項數據為演算法輸入的特徵變數，包括雷達、汽車位置、汽車速度等，輸出則為各個行動的Q值估計。由於在無人自駕車中，不同情境下所需的狀態數量並不相同，因此本研究將道路行駛及正向停車區分為兩種模式：分別為自動駕駛模式及自動停車模式。在自動駕駛模式的訓練中，本研究使用Double DQN在約9000個回合時得到了最佳的訓練結果，使得汽車行駛得較快速且順暢。而在自動停車模式的訓練中，本研究使用Double DQN訓練自駕車代理人，其訓練環境則是從停車場門口到停車位完成正向(head-in)停車，可是效果不佳，因此，自駕車代理人改採多重模式(Multi-mode)進行訓練：從停車場門口行駛到停車位附近使用自動駕駛模式，並在汽車到達停車位附近時切換為自動停車模式。從停車場門口到停車位附近的訓練使用Double DQN，在約9800個回合達到最佳結果；而從停車位附近停進車位的訓練中，本研究使用了Q-Learning在約3500個回合即達到了最佳的訓練結果。
摘要(英)	The present study employees algorithms of Double Deep Q Network ( Double DQN) and Q-Learning for training self-driving car agents in driving and parking modes, with the input features form data of the car (e.g., radar, car position, speed, etc.), and the estimation of Q value for each action as the output.Under different modes, the state spaces would be quite different from each other; hence, in the present study, it aims to adopt two certain situations, i.e., the driving mode as well as the parking mode for investigation. Trained by Double DQN, the self-driving mode got the best result with about 9000 episodes. Meanwhile, in the parking situation, Double DQN was applied at first training the car to drive from the entrance of the parking lot into the parking space, but the performance was poor. Therefore, the car agents could use muti-mode training for the self-parking situation: first, use self-driving mode (with Double DQN) from the entrance of the parking lot to the position near the parking space, and then the car was trained to park into the parking space with a self-parking mode by Q-Learning. Accordingly, for searching the parking-space situation, the best result was achieved with about 9800 episodes with Double DQN. Then the car was trained to park into the parking space with Q-Learning, with the best of 3500-episode training.
關鍵字(中)	★ 強化學習 ★ Q-Learning ★ DQN ★ Double DQN ★ 無人自駕車	關鍵字(英)	★ Reinforcement Learning ★ Q-Learning ★ DQN ★ Double DQN ★ Self-Driving Car
論文目次	摘要i Abstract iii 誌謝v 目錄vii 一、緒論1 二、研究方法3 2.1 強化學習(Reinforcement Learning) . . . . . . . . 3 2.1.1 馬可夫決策過程Markov Decision Process(MDP) . . . 3 2.2 值迭代演算法 . . . .. . . . . . . . . . . . . . . 4 2.2.1 狀態值. . . . . . . . . . . . . . . . . . . . . 4 2.2.2 貝爾曼最佳化方程式. . . . . . . . . . . . . . . 4 2.2.3 行動值. . . . . . . . . . . . . . . . . . . . . 5 2.2.4 值迭代演算法 . . . . . . . . . . . . . . . . . . 6 2.3 探索與利用. . . . . . . . . . . . . . . . . . . . 7 2.4 Q-Learning . . . . . . . . . . . . . . . . . . . 8 2.5 Deep Q Network(DQN) . . . . . . . . . . . . . . . 9 2.6 Double DQN . . . . . . . . . . . . . . . . . . . 10 三、實驗13 3.1 自動駕駛模式. . . . . . . . . . . . . . . . . . . 13 3.1.1 環境. . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 參數介紹. . . . . . . . . . . . . . . . . . . . 14 3.1.3 流程. . . . . . . . . . . . . . . . . . . . . . 14 3.2 自動停車模式 . . . . . . . . . . . . . . . . . . . 15 3.2.1 環境. . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 參數介紹 . . . . . . . . . . . . . . . . . . . . 16 3.2.3 流程. . . . . . . . . . . . . . . . . . . . . . 17 四、總結21 參考文獻23
參考文獻	[1] M. Bojarski, D. Del Testa, D. Dworakowski, et al., End to end learning for self-driving cars, Apr. 25, 2016. arXiv: 1604.07316[cs]. [Online]. Available: http://arxiv.org/ abs/1604.07316 (visited on 12/31/2022). [2] C. Badue, R. Guidolini, R. V. Carneiro, et al., Self-driving cars: A survey, Oct. 2, 2019. arXiv: 1901.04407[cs]. [Online]. Available: http://arxiv.org/abs/1901.04407 (visited on 01/01/2023). [3] L. Fridman, J. Terwilliger, and B. Jenik, “Deeptraffic: Crowdsourced hyperparameter tuning of deep reinforcement learning systems for multi-agent dense traffic navigation,” arXiv preprint arXiv:1801.02805, 2018. [4] Samuel Arzt, AI learns to park - deep reinforcement learning, Aug. 23, 2019. [Online]. Available: https://www.youtube.com/watch?v=VMp6pq6_QjI (visited on 10/18/2022). [5] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3, pp. 279–292, 1992. [6] Cheesy AI, I teach AI how to drive a car with reinforcement learning, Nov. 6, 2019. [Online]. Available: https://www.youtube.com/watch?v=pT2Yzr1RqBo (visited on 10/18/2022). [7] V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 26, 2015, issn: 0028-0836, 1476-4687. doi: 10.1038/nature14236. [Online]. Available: http://www.nature.com/ articles/nature14236 (visited on 10/18/2022). [8] “Train a deep q network with TF-agents \| TensorFlow agents.” (), [Online]. Available: https : / / www . tensorflow . org / agents / tutorials / 1 _ dqn _ tutorial (visited on 10/21/2022). [9] V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013. [10] H. Hasselt, “Double q-learning,” Advances in neural information processing systems, vol. 23, 2010. [11] H. v. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double qlearning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, Mar. 2, 2016, Number: 1, issn: 2374-3468. doi: 10.1609/aaai.v30i1.10295. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/10295 (visited on 10/18/2022). [12] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237–285, May 1, 1996, issn: 1076- 9757. doi: 10.1613/jair.301. [Online]. Available: https://www.jair.org/index.php/ jair/article/view/10166 (visited on 10/18/2022). [13] R. S. Sutton and A. G. Barto, Reinforcement Learning, second edition: An Introduction. MIT Press, Nov. 13, 2018, 549 pp., Google-Books-ID: uWV0DwAAQBAJ, isbn: 978-0- 262-35270-3. [14] R. Bellman, “A markovian decision process,” Indiana Univ. Math. J., vol. 6, pp. 679–684, 4 1957, issn: 0022-2518. [15] R. Bellman, Dynamic Programming, 1st ed. Princeton, NJ, USA: Princeton University Press, 1957. [16] M. Lapan, Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt Publishing Ltd, Jun. 21, 2018, 547 pp., Google-Books-ID: xKdhDwAAQBAJ, isbn: 978- 1-78883-930-3. [17] “Value iteration —introduction to reinforcement learning.” (), [Online]. Available: https: //gibberblot.github.io/rl-notes/single-agent/value-iteration.html (visited on 01/06/2023). [18] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989. 24
指導教授	洪盟凱胡中興	審核日期	2023-1-18
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 108221018 詳細資訊