應用價值基礎之元強化學習方法於交通號誌控制之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：33

、訪客IP：3.144.123.172

姓名

李秉原(Ping-Yuan Lee) 查詢紙本館藏

畢業系所

土木工程學系

論文名稱

應用價值基礎之元強化學習方法於交通號誌控制之研究
(Application Value-based Reinforcement Learning with Meta Leaning on Traffic Signal Control)

相關論文

★ 圖書館系統通閱移送書籍之車輛途程問題	★ 起迄對旅行時間目標下高速公路匝道儀控之研究
★ 結合限制規劃法與螞蟻演算法求解運動排程問題	★ 共同邊界資料包絡分析法在運輸業之應用-以國內航線之經營效率為例
★ 雙北市公車乘客知覺服務品質、知覺價值、滿意度、行為意向路線與乘客之跨層次中介效果與調節式中介效果	★ Investigating the influential factors of public bicycle system and cyclist heterogeneity
★ A Mixed Integer Programming Formulation for the Three-Dimensional Unit Load Device Packing Problem	★ 高速公路旅行時間預測之研究--函數資料分析之應用
★ Behavior Intention and its Influential Factors for Motorcycle Express Service	★ Inferring transportation modes (bus or vehicle) from mobile phone data using support vector machine and deep neural network.
★ 混合羅吉特模型於運具選擇之應用-以中央大學到桃園高鐵站為例	★ Preprocessing of mobile phone signal data for vehicle mode identification using map-matching technique
★ 含額外限制式動態用路人均衡模型之研究	★ 動態起迄旅次矩陣推估模型之研究
★ 動態號誌時制控制模型求解演算法之研究	★ 不同決策變數下動態用路人均衡路徑選擇模型之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-12-31以後開放)

摘要(中)

隨著路口監視器的普及與影像辨識技術提升，傳統適應性號誌有了廣泛使用的契機，其控制策略通常需要進行參數的校估，導致在實際應用上有所限制。近年來，強化學習在各領域大放異彩，其試誤的特性，能學習並掌握車流特性，達到適應不同車流狀況的目的，適合應用於號誌控制系統。
因此，本研究以近年國內外之研究為基礎，利用小客車當量考量國內混合車流之特性，同時考慮上游路口狀態，並以強度基礎壓力作為狀態及獎勵的主要考量，訓練方法採價值基礎的強化學習方法Rainbow DQN搭配元學習方法MAML(Model-Agnostic Meta-Learning)，比較（1）不使用當量、（2）使用固定當量、（3）自我學習當量與（4）元學習搭配固定當量四種模型。前三者中，希望了解當量是否對於訓練結果造成影響，第四個模型則是探討元學習在國內的場景中是否有效，並以Vissim作為模擬軟體，對臺北市之小路網作為環境進行訓練，使用尖峰小時之交通量調查資料作為輸入流量。
本研究發現當量對於訓練結果之績效有所提升，而在固定當量及學習當量中沒有差異，而設計之獎勵可以有效反應旅行時間，在各項數據上皆指出模型大多尚未收斂，推測可能為(1)模型架構較為複雜，訓練時間不足 (2)訓練流量差異過大 (3)經驗池多次遭到清空。於元學習部分，模型未如預期迅速收斂，建議在內層強化學習確定效果後再進行元學習。

摘要(英)

With the popularization of intersection monitors and the improvement of image recognition technology. The adaptive signal control method has the opportunity to be widely used, but its control strategy usually requires parameter evaluation, which leads to limitations in practical applications. In recent years, reinforcement learning has been gaining great popularity in various fields, and its trial-and-error characteristics can learn and master the characteristics of traffic flow to achieve the purpose of adapting to different traffic conditions, which is suitable for application in signal control systems.
Therefore, based on recent domestic and foreign research, this study considered the characteristics of domestic mixed traffic flow using small passenger car equivalent (PCE), and also considered the upstream intersection status, and used the intensity-based pressure as the main consideration for status and reward. Four model disgned respectively with (1) unconsidering PCE, (2) considering fixed PCE, (3) considering self-learning PCE, and (4) using meta-learning with considering fixed PCE. Learning methodology is the value-based reinforcemenet method Rainbow DQN or with meta learning method Model-Agnostic Meta-Learning (MAML). In the first three models, we wanted to find out whether PCE affects the training results. In the fourth model, we investigated whether meta-learning is effective in the domestic scenarios. Vissim is used as the simulation software to conduct training. Training on a small road network in Taipei City, using the traffic volume survey data of peak hours as the input flow.
In this study, we found that considering PCE can improve the training results, but there was no difference between model 2(fixed PCE) and model 3(self-learning PCE). Reward could effectively respond to the travel time. When using the meta-learning method, the models did not converge as quickly as expected, and it is recommended that meta-learning be conducted after the effect of internal reinforcement learning has been confirmed.

關鍵字(中)

★ 適應性號誌控制
★ 深度強化學習
★ 元學習
★ Rainbow DQN
★ Model-Agnostic Meta-Learning

關鍵字(英)

★ adaptive traffic signal control
★ deep reinforcement learning
★ meta learning
★ Rainbow DQN
★ Model-Agnostic Meta-Lear

論文目次

目錄
摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 ix
第一章緒論 1
第二章文獻回顧 5
2.1　定時號誌的困境與動態號誌發展 5
2.1　強化學習方法 6
2.2　狀態、動作、獎勵的設計 7
2.3　類神經網路架構設計 9
2.4　學習機制 9
2.5　模擬軟體 9
2.6　綜合評述 12
第三章研究方法 13
3.1　深度強化學習 13
3.2　元學習(MAML) 19
3.3　小結 21
第四章模型與實驗設計 22
4.1　路口相關之交通變數定義 22
4.2　模型設計 23
4.3　訓練流程: 31
4.4　研究範圍 35
4.3　實驗設計 37
第五章模型訓練與結果 40
5.1　基礎分析 40
5.1.1　獎勵分析 40
5.1.2　損失分析 41
5.1.3　旅行時間 41
5.1.4　清空經驗池之影響 42
5.1.5　當量學習結果 43
5.2　討論 43
5.2.1　獎勵與旅行時間之關係 43
5.2.2　與現行定時號誌之旅行時間比較 44
5.2.4　分時流量對於訓練結果之影響 45
5.3小結 47
第六章結論與建議 48
6.1　結論 48
6.2　建議 48
參考文獻 50

參考文獻

[1] 吳沛儒，曾明德，吳東凌，林良泰，蘇昭銘，王晉元，吳毅成，周家慶，黃啟倡，黃培書，何國豪，倪文哲，郭松庭，2020，示範型強化學習之人工智慧號誌控制，109年中華民國運輸年會論文集第五冊 222-237頁
[2] 胡大瀛，李卓育，2021，深度強化學習下號誌控制設計獎勵機制之探討，110年中華民國運輸年會論文集第五冊 543-566頁
[3] 陳惠國，2022，強化學習應用於交通號誌控制之展望，中華道路季刊第六十一卷第四期 43-54頁
[4] 許添本，程楷祐，2020，以深度強化學習方式建構混合車流之AI最佳化號誌時制計畫，109年中華民國運輸年會論文集第五冊 331-352頁
[5] 許添本，黃建皓，2022，深度確定性策略梯度法建構幹道即時號誌控制系統，111年中華民國運輸年會論文集第五冊 1-26頁
[6] 許添本，蔡沐軒，2021，示範式深度強化學習應用於號誌時制最佳化之研究，110年中華民國運輸年會論文集第五冊 468-493頁
[7] 臺北市交通管制工程處，2022，常見問答：為促進交通流暢，交通號誌秒數及連鎖如何設定？臺北市交通管制工程,檢自：https://www.bote.gov.taipei/News_Content.aspx?n=20A2BA930381C524&s=6011548EB76C0FD3 (Oct 15, 2022)
[8] BBC News, 2017. Google AI defeats human Go champion. British Broadcasting Corporation. Retrieved from https://www.bbc.com/news/technology-40042581(Feb 20, 2022)
[9] Liang, X., Du, X., Wang, G, & Han, Z., 2019. A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Transactions on Vehicular Technology, 68(2), 1243–1253.
[10] Lopez, P. A., Behrisch, M, Bieker-Walz, L., Erdmann, J. E., Flotteröd, Y., Hilbrich, R., Lücken, L., Rummel, J., Wagner, P., & Wießner, E., 2018. Microscopic Traffic Simulation using SUMO. Complex Systems, 16, pp. 29-53.
[11] Google, 2022, Google Earth 6.0-version: Taipei City. Retrieved from https://earth.google.com/web/ (April 5, 2022)
[12] Hessel, M., Modayil, J., Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D., 2017. Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv preprint arXiv: 1710.02298. doi: 10.48550/arXiv.1710.02298.
[13] Hester, T., Vecerik, M., Pietquin, O. et al., 2017. Learning from Demonstrations for Real World Reinforcement Learning. arXiv preprint arXiv: 1704.03732.
[14] Hunt, P. B., Robertson, D. I., Bretherton, R. D., & Winton, R. I., 1981. SCOOT-A Traffic Responsive Method of Coordinating Signals. Transport and Road Research Laboratory Report LR 1041
[15] Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D.,2015. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations.
[16] Luk, J. Y. K., Sims, A. G., & Lowrie, P. R., 1982. SCATS-Application and Field Comparison with a TRANSYT Optimised Fixed Time System. In Proceedings of International Conference on Road Traffic Signaling, London, UK, pp. 77-74.
[17] Mnih, V., Badia, A., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K., 2016. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, 48, pp. 1928-1937.
[18] Mnih, V., Kavukcuoglu, K., Silver, D. et al., 2015. Human-level control through deep reinforcement learning. Nature, 518, pp. 529-533.
[19] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M., 2013. Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602. doi: 10.48550/arXiv.1312.5602.
[20] Papers with Code, 2022. Trends: Frameworks. Retrieved from https://paperswithcode.com/trends (January 7, 2022)
[21] PTV Group, 2022. PTV Vissim. PTV Group, Germany
[22] Rummery, G., & Niranjan, M., 1994. On-Line Q-Learning Using Connectionist Systems. Technical Report CUED/F-INFENG/TR 166
[23] Schulman, J., Levine, S., Moritz, P., Jordan, M., & Abbeel, P., 2015. Trust Region Policy Optimization. arXiv preprint arXiv: 1502.05477. doi: 10.48550/arXiv.1502.05477.
[24] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O., 2017. Proximal Policy Optimization Algorithms. arXiv preprint arXiv: 1707.06347. doi: 10.48550/arXiv.1707.06347.
[25] Silver, D., Huang, A., Maddison, C. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529, pp. 484-489. doi: 10.1038/nature16961.
[26] Wang, S., Xie, X., Huang, K., Zeng, J., & Cai, Z., 2019. Deep Reinforcement Learning-Based Traffic Signal Control Using High-Resolution Event-Based Data. Entropy 21. doi:10.3390/e21080744
[27] Wei, H., Zheng, G., Yao, H., & Li, Z., 2018. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, pp. 2496-2505. doi: 10.1145/3219819.3220096
[28] Wei, H., Xu, N., Zhang, H., Zheng, G., Zang, X., Chen, C., Zhang, W., Zhu, Y., Xu, K., Li, Z., 2019a. CoLight: Learning Network-level Cooperation for Traffic Signal Control. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, New York, USA, pp. 1913–1922. doi: 10.1145/3357384.3357902
[29] Wei, H., Chen, C., Zheng, Z., Wu, K., Gayah V., Xu, K., Li, Z., 2019b. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, USA, pp. 1290-1298. doi: 10.1145/3292500.3330949
[30] Xu, B., Wang, Y., Wang, Z., Jia, H., & Lu, Z., 2021. Hierarchically and Cooperatively Learning Traffic Signal Control. In Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), pp. 669-677. doi: 10.1609/aaai.v35i1.16147
[31] Zang, X., Yao, H., Zheng, G., Xu, N., Xu, K., & Li, Z., 2020. MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(1), pp. 1153-1160. doi: 10.1609/aaai.v34i01.5467.
[32] Zhang, H., Feng, S., Liu, C., Ding, Y., Zhu, Y., Zhou, Z., Zhang, W., Yu, Y., Jin, H., & Li, Z., 2019. CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario. In Proceedings of the World Wide Web Conference, pp.3620-3624. doi:10.1145/3308558.3314139.
[33] Zhao, W., Ye, Y., Ding, J., Wang, T., Wei, T., & Chen, M., 2022. IPDALight: Intensity- and Phase Duration-Aware Traffic Signal Control Based on Reinforcement Learning. Journal of Systems Architecture, 123, pp.102374-102385. doi: 10.1016/j.sysarc.2021.102374.
[34] Zheng, G., Xiong, Y., Zang, X., Feng, J., Wei, H., Zhang, H., Li, Y., Xu, K., & Li, Z., 2019. Learning Phase Competition for Traffic Signal Control. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1963-1972. doi: 10.1145/3357384.3357900.
[35] Zheng, G., Zang, X., Xu, N. et al., 2019. Diagnosing Reinforcement Learning for Traffic Signal Control. arXiv preprint arXiv: 1905.04716.

指導教授

陳惠國(Huey-Kuo Chen)

審核日期

2023-1-19

推文