論文名稱 應用價值基礎之元強化學習方法於交通號誌控制之研究
(Application Value-based Reinforcement Learning with Meta Leaning on Traffic Signal Control)
摘要(中) 隨著路口監視器的普及與影像辨識技術提升,傳統適應性號誌有了廣泛使用的契機,其控制策略通常需要進行參數的校估,導致在實際應用上有所限制。近年來,強化學習在各領域大放異彩,其試誤的特性,能學習並掌握車流特性,達到適應不同車流狀況的目的,適合應用於號誌控制系統。
因此,本研究以近年國內外之研究為基礎,利用小客車當量考量國內混合車流之特性,同時考慮上游路口狀態,並以強度基礎壓力作為狀態及獎勵的主要考量,訓練方法採價值基礎的強化學習方法Rainbow DQN搭配元學習方法MAML(Model-Agnostic Meta-Learning),比較(1)不使用當量、(2)使用固定當量、(3)自我學習當量與(4)元學習搭配固定當量四種模型。前三者中,希望了解當量是否對於訓練結果造成影響,第四個模型則是探討元學習在國內的場景中是否有效,並以Vissim作為模擬軟體,對臺北市之小路網作為環境進行訓練,使用尖峰小時之交通量調查資料作為輸入流量。
本研究發現當量對於訓練結果之績效有所提升,而在固定當量及學習當量中沒有差異,而設計之獎勵可以有效反應旅行時間,在各項數據上皆指出模型大多尚未收斂,推測可能為(1)模型架構較為複雜,訓練時間不足 (2)訓練流量差異過大 (3)經驗池多次遭到清空 。於元學習部分,模型未如預期迅速收斂,建議在內層強化學習確定效果後再進行元學習。
摘要(英) With the popularization of intersection monitors and the improvement of image recognition technology. The adaptive signal control method has the opportunity to be widely used, but its control strategy usually requires parameter evaluation, which leads to limitations in practical applications. In recent years, reinforcement learning has been gaining great popularity in various fields, and its trial-and-error characteristics can learn and master the characteristics of traffic flow to achieve the purpose of adapting to different traffic conditions, which is suitable for application in signal control systems.
Therefore, based on recent domestic and foreign research, this study considered the characteristics of domestic mixed traffic flow using small passenger car equivalent (PCE), and also considered the upstream intersection status, and used the intensity-based pressure as the main consideration for status and reward. Four model disgned respectively with (1) unconsidering PCE, (2) considering fixed PCE, (3) considering self-learning PCE, and (4) using meta-learning with considering fixed PCE. Learning methodology is the value-based reinforcemenet method Rainbow DQN or with meta learning method Model-Agnostic Meta-Learning (MAML). In the first three models, we wanted to find out whether PCE affects the training results. In the fourth model, we investigated whether meta-learning is effective in the domestic scenarios. Vissim is used as the simulation software to conduct training. Training on a small road network in Taipei City, using the traffic volume survey data of peak hours as the input flow.
In this study, we found that considering PCE can improve the training results, but there was no difference between model 2(fixed PCE) and model 3(self-learning PCE). Reward could effectively respond to the travel time. When using the meta-learning method, the models did not converge as quickly as expected, and it is recommended that meta-learning be conducted after the effect of internal reinforcement learning has been confirmed.
關鍵字(中) ★ 適應性號誌控制
★ 深度強化學習
★ 元學習
★ Rainbow DQN
★ Model-Agnostic Meta-Learning
關鍵字(英) ★ adaptive traffic signal control
★ deep reinforcement learning
★ meta learning
★ Rainbow DQN
★ Model-Agnostic Meta-Lear
論文目次 目錄
摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 ix
第一章 緒論 1
第二章 文獻回顧 5
2.1 定時號誌的困境與動態號誌發展 5
2.1 強化學習方法 6
2.2 狀態、動作、獎勵的設計 7
2.3 類神經網路架構設計 9
2.4 學習機制 9
2.5 模擬軟體 9
2.6 綜合評述 12
第三章 研究方法 13
3.1 深度強化學習 13
3.2 元學習(MAML) 19
3.3 小結 21
第四章 模型與實驗設計 22
4.1 路口相關之交通變數定義 22
4.2 模型設計 23
4.3 訓練流程: 31
4.4 研究範圍 35
4.3 實驗設計 37
第五章 模型訓練與結果 40
5.1 基礎分析 40
5.1.1 獎勵分析 40
5.1.2 損失分析 41
5.1.3 旅行時間 41
5.1.4 清空經驗池之影響 42
5.1.5 當量學習結果 43
5.2 討論 43
5.2.1 獎勵與旅行時間之關係 43
5.2.2 與現行定時號誌之旅行時間比較 44
5.2.4 分時流量對於訓練結果之影響 45
5.3小結 47
第六章 結論與建議 48
6.1 結論 48
6.2 建議 48
參考文獻 50

指導教授 陳惠國(Huey-Kuo Chen) 審核日期 2023-1-19
