應用價值基礎之元強化學習方法於交通號誌控制之研究;Application Value-based Reinforcement Learning with Meta Leaning on Traffic Signal Control

NCU Institutional Repository > 工學院 > 土木工程研究所 > 博碩士論文 > Item 987654321/91696

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/91696

題名:	應用價值基礎之元強化學習方法於交通號誌控制之研究;Application Value-based Reinforcement Learning with Meta Leaning on Traffic Signal Control
作者:	李秉原;Lee, Ping-Yuan
貢獻者:	土木工程學系
關鍵詞:	適應性號誌控制;深度強化學習;元學習;Rainbow DQN;Model-Agnostic Meta-Learning;adaptive traffic signal control;deep reinforcement learning;meta learning;Rainbow DQN;Model-Agnostic Meta-Lear
日期:	2023-01-19
上傳時間:	2024-09-19 14:10:45 (UTC+8)
出版者:	國立中央大學
摘要:	隨著路口監視器的普及與影像辨識技術提升，傳統適應性號誌有了廣泛使用的契機，其控制策略通常需要進行參數的校估，導致在實際應用上有所限制。近年來，強化學習在各領域大放異彩，其試誤的特性，能學習並掌握車流特性，達到適應不同車流狀況的目的，適合應用於號誌控制系統。因此，本研究以近年國內外之研究為基礎，利用小客車當量考量國內混合車流之特性，同時考慮上游路口狀態，並以強度基礎壓力作為狀態及獎勵的主要考量，訓練方法採價值基礎的強化學習方法Rainbow DQN搭配元學習方法MAML(Model-Agnostic Meta-Learning)，比較（1）不使用當量、（2）使用固定當量、（3）自我學習當量與（4）元學習搭配固定當量四種模型。前三者中，希望了解當量是否對於訓練結果造成影響，第四個模型則是探討元學習在國內的場景中是否有效，並以Vissim作為模擬軟體，對臺北市之小路網作為環境進行訓練，使用尖峰小時之交通量調查資料作為輸入流量。本研究發現當量對於訓練結果之績效有所提升，而在固定當量及學習當量中沒有差異，而設計之獎勵可以有效反應旅行時間，在各項數據上皆指出模型大多尚未收斂，推測可能為(1)模型架構較為複雜，訓練時間不足 (2)訓練流量差異過大 (3)經驗池多次遭到清空。於元學習部分，模型未如預期迅速收斂，建議在內層強化學習確定效果後再進行元學習。 ;With the popularization of intersection monitors and the improvement of image recognition technology. The adaptive signal control method has the opportunity to be widely used, but its control strategy usually requires parameter evaluation, which leads to limitations in practical applications. In recent years, reinforcement learning has been gaining great popularity in various fields, and its trial-and-error characteristics can learn and master the characteristics of traffic flow to achieve the purpose of adapting to different traffic conditions, which is suitable for application in signal control systems. Therefore, based on recent domestic and foreign research, this study considered the characteristics of domestic mixed traffic flow using small passenger car equivalent (PCE), and also considered the upstream intersection status, and used the intensity-based pressure as the main consideration for status and reward. Four model disgned respectively with (1) unconsidering PCE, (2) considering fixed PCE, (3) considering self-learning PCE, and (4) using meta-learning with considering fixed PCE. Learning methodology is the value-based reinforcemenet method Rainbow DQN or with meta learning method Model-Agnostic Meta-Learning (MAML). In the first three models, we wanted to find out whether PCE affects the training results. In the fourth model, we investigated whether meta-learning is effective in the domestic scenarios. Vissim is used as the simulation software to conduct training. Training on a small road network in Taipei City, using the traffic volume survey data of peak hours as the input flow. In this study, we found that considering PCE can improve the training results, but there was no difference between model 2(fixed PCE) and model 3(self-learning PCE). Reward could effectively respond to the travel time. When using the meta-learning method, the models did not converge as quickly as expected, and it is recommended that meta-learning be conducted after the effect of internal reinforcement learning has been confirmed.
顯示於類別:	[土木工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	131	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....