中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/94490
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41267904      線上人數 : 168
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/94490


    題名: 天鉤主動隔震系統應用強化學習DDPG與直接輸出回饋之最佳化設計與分析
    作者: 洪憲証;Hong, Xian-Zheng
    貢獻者: 土木工程學系
    關鍵詞: 主動隔震;天鉤阻尼;直接輸出回饋;參數更新迭代;強化學習;DDPG;LSTM;Active seismic isolation;Skyhook damping;Direct Output Feedback;Parameter updating iteration;reinforcement learning;DDPG;LSTM
    日期: 2024-07-29
    上傳時間: 2024-10-09 14:47:36 (UTC+8)
    出版者: 國立中央大學
    摘要: 本研究提出以機器學習領域中強化學習(Reinforcement Learning)之DDPG (Deep Deterministic Policy Gradient),應用於單自由度主動隔震系統之最佳化控制,並與傳統控制理論之直接輸出回饋(Direct Output Feedback)之最佳化結果,進行分析與比較。文中首先推導單自由度主動隔震系統之運動方程式與狀態空間表示式,之後以天鉤主動隔震原理設計主動控制力。天鉤主動隔震原是以絕對速度回饋訊號計算控制力以達隔震效果,而本研究將此控制力進行調整,改以相對速度以及地表速度回饋之訊號進行控制力計算,如此能夠提升訊號量測的便利性之外,更能增加回饋訊號的穩定,也因此需最佳化設計兩個回饋訊號之增益係數。此最佳化問題可透過傳統控制領域之直接輸出回饋以及參數更新迭代法,針對一脈衝地表加速度之初始條件,以絕對加速度最小化為目標函數,設計出非時變之最佳控制力增益係數,並進行頻率反應函數及歷時分析之數值模擬,以了解傳統最佳化方式之隔震效果。此外,本研究另外使用強化學習方法,透過建立天鉤主動隔震問題的環境以及使用DDPG作為主體,將傳統控制領域中非時變之增益係數,改為一可時變之神經網路,與歷時中進行訓練與學習。其中,為與傳統控制理論結果合理比較,此強化學習的訓練環境亦以一脈衝地表加速度下初始條件,希望獲得最小絕對加速度反應,以此設定獎勵函數(reward),並使用相同的回饋訊號作為觀測值(observation)。雖然DDPG主體提供可時變之神經網路,但經過訓練之後之DDPG主體,所輸出的動作(action)為非時變之控制力增益係數。另外,更在DDPG主體中加入LSTM並進行訓練,最後同樣得到輸出非時變增益係數之DDPG主體。之後,將訓練完DDPG主體進行頻率反應函數以及歷時分析之數值模擬,得到的隔震效果與直接輸出回饋設計方法相似,甚至在部分情況下優於直接輸出回饋設計方法,驗證強化學習DDPG能夠用於土木領域的主動控制問題。;This study proposes the application of the Deep Deterministic Policy Gradient (DDPG) from the field of machine learning, specifically reinforcement learning, to optimize the control of a single-degree-of-freedom active seismic isolation system. The results are analyzed and compared with the optimization results from traditional control theory′s Direct Output Feedback. Initially, the motion equations and state-space representations of the single-degree-of-freedom active seismic isolation system are derived. Subsequently, the active control force is designed based on the skyhook active isolation principle. Traditionally, the skyhook active isolation calculates the control force using absolute velocity feedback to achieve isolation. In this study, the control force is adjusted to use relative velocity and ground velocity feedback signals, enhancing the convenience of signal measurement and increasing the stability of the feedback signals. Consequently, the gain coefficients of the two feedback signals need to be optimized.
    This optimization problem can be addressed using the Direct Output Feedback from traditional control theory and the parameter updating iteration. The initial conditions involve an impulse ground acceleration, with the objective function being the minimization of absolute acceleration. This leads to the design of time-invariant optimal control force gain coefficients. Numerical simulations, including frequency response function and time history analysis, are conducted to understand the isolation effectiveness of the traditional optimization method. Additionally, this study employs a reinforcement learning approach by establishing an environment for the skyhook active isolation problem and using DDPG as the agent. Unlike the traditional time-invariant gain coefficients, DDPG utilizes a neural network that can vary over time and undergoes training and learning through the time history. To ensure a reasonable comparison with traditional control theory, the environment for reinforcement learning also uses initial conditions of an impulse ground acceleration, aiming to minimize the absolute acceleration response, which sets the reward function. The same feedback signals are used as observations. Although the DDPG agent provides a time-varying neural network, the control force gain coefficients outputted by the trained DDPG agent are time-invariant. Furthermore, LSTM is incorporated into the DDPG agent and trained, resulting in a DDPG agent that also outputs time-invariant gain coefficients. Finally, numerical simulations, including frequency response function and time history analysis, are conducted using the trained DDPG agent. The isolation effectiveness obtained is similar to or even better than the Direct Output Feedback design method in certain cases, verifying that the DDPG reinforcement learning method can be applied to active control problems in civil engineering.
    顯示於類別:[土木工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML19檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明