English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78937/78937 (100%)
造訪人次 : 39422184      線上人數 : 597
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/90628


    題名: 應用價值基礎之元強化學習方法於交通號誌控制之研究;Application Value-based Reinforcement Learning with Meta Leaning on Traffic Signal Control
    作者: 李秉原;Lee, Ping-Yuan
    貢獻者: 土木工程學系
    關鍵詞: 適應性號誌控制;深度強化學習;元學習;Rainbow DQN;Model-Agnostic Meta-Learning;adaptive traffic signal control;deep reinforcement learning;meta learning;Rainbow DQN;Model-Agnostic Meta-Lear
    日期: 2023-01-19
    上傳時間: 2023-05-09 17:15:11 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著路口監視器的普及與影像辨識技術提升,傳統適應性號誌有了廣泛使用的契機,其控制策略通常需要進行參數的校估,導致在實際應用上有所限制。近年來,強化學習在各領域大放異彩,其試誤的特性,能學習並掌握車流特性,達到適應不同車流狀況的目的,適合應用於號誌控制系統。
    因此,本研究以近年國內外之研究為基礎,利用小客車當量考量國內混合車流之特性,同時考慮上游路口狀態,並以強度基礎壓力作為狀態及獎勵的主要考量,訓練方法採價值基礎的強化學習方法Rainbow DQN搭配元學習方法MAML(Model-Agnostic Meta-Learning),比較(1)不使用當量、(2)使用固定當量、(3)自我學習當量與(4)元學習搭配固定當量四種模型。前三者中,希望了解當量是否對於訓練結果造成影響,第四個模型則是探討元學習在國內的場景中是否有效,並以Vissim作為模擬軟體,對臺北市之小路網作為環境進行訓練,使用尖峰小時之交通量調查資料作為輸入流量。
    本研究發現當量對於訓練結果之績效有所提升,而在固定當量及學習當量中沒有差異,而設計之獎勵可以有效反應旅行時間,在各項數據上皆指出模型大多尚未收斂,推測可能為(1)模型架構較為複雜,訓練時間不足 (2)訓練流量差異過大 (3)經驗池多次遭到清空 。於元學習部分,模型未如預期迅速收斂,建議在內層強化學習確定效果後再進行元學習。
    ;With the popularization of intersection monitors and the improvement of image recognition technology. The adaptive signal control method has the opportunity to be widely used, but its control strategy usually requires parameter evaluation, which leads to limitations in practical applications. In recent years, reinforcement learning has been gaining great popularity in various fields, and its trial-and-error characteristics can learn and master the characteristics of traffic flow to achieve the purpose of adapting to different traffic conditions, which is suitable for application in signal control systems.
    Therefore, based on recent domestic and foreign research, this study considered the characteristics of domestic mixed traffic flow using small passenger car equivalent (PCE), and also considered the upstream intersection status, and used the intensity-based pressure as the main consideration for status and reward. Four model disgned respectively with (1) unconsidering PCE, (2) considering fixed PCE, (3) considering self-learning PCE, and (4) using meta-learning with considering fixed PCE. Learning methodology is the value-based reinforcemenet method Rainbow DQN or with meta learning method Model-Agnostic Meta-Learning (MAML). In the first three models, we wanted to find out whether PCE affects the training results. In the fourth model, we investigated whether meta-learning is effective in the domestic scenarios. Vissim is used as the simulation software to conduct training. Training on a small road network in Taipei City, using the traffic volume survey data of peak hours as the input flow.
    In this study, we found that considering PCE can improve the training results, but there was no difference between model 2(fixed PCE) and model 3(self-learning PCE). Reward could effectively respond to the travel time. When using the meta-learning method, the models did not converge as quickly as expected, and it is recommended that meta-learning be conducted after the effect of internal reinforcement learning has been confirmed.
    顯示於類別:[土木工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML78檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明