天鉤主動隔震系統應用強化學習DDPG與直接輸出回饋之最佳化設計與分析

NCUIR > Engineering College > Graduate Institute of Civil Engineering > Electronic Thesis & Dissertation > Item 987654321/94490

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/94490

Title:	天鉤主動隔震系統應用強化學習DDPG與直接輸出回饋之最佳化設計與分析
Authors:	洪憲証;Hong, Xian-Zheng
Contributors:	土木工程學系
Keywords:	主動隔震;天鉤阻尼;直接輸出回饋;參數更新迭代;強化學習;DDPG;LSTM;Active seismic isolation;Skyhook damping;Direct Output Feedback;Parameter updating iteration;reinforcement learning;DDPG;LSTM
Date:	2024-07-29
Issue Date:	2024-10-09 14:47:36 (UTC+8)
Publisher:	國立中央大學
Abstract:	本研究提出以機器學習領域中強化學習(Reinforcement Learning)之DDPG (Deep Deterministic Policy Gradient)，應用於單自由度主動隔震系統之最佳化控制，並與傳統控制理論之直接輸出回饋(Direct Output Feedback)之最佳化結果，進行分析與比較。文中首先推導單自由度主動隔震系統之運動方程式與狀態空間表示式，之後以天鉤主動隔震原理設計主動控制力。天鉤主動隔震原是以絕對速度回饋訊號計算控制力以達隔震效果，而本研究將此控制力進行調整，改以相對速度以及地表速度回饋之訊號進行控制力計算，如此能夠提升訊號量測的便利性之外，更能增加回饋訊號的穩定，也因此需最佳化設計兩個回饋訊號之增益係數。此最佳化問題可透過傳統控制領域之直接輸出回饋以及參數更新迭代法，針對一脈衝地表加速度之初始條件，以絕對加速度最小化為目標函數，設計出非時變之最佳控制力增益係數，並進行頻率反應函數及歷時分析之數值模擬，以了解傳統最佳化方式之隔震效果。此外，本研究另外使用強化學習方法，透過建立天鉤主動隔震問題的環境以及使用DDPG作為主體，將傳統控制領域中非時變之增益係數，改為一可時變之神經網路，與歷時中進行訓練與學習。其中，為與傳統控制理論結果合理比較，此強化學習的訓練環境亦以一脈衝地表加速度下初始條件，希望獲得最小絕對加速度反應，以此設定獎勵函數(reward)，並使用相同的回饋訊號作為觀測值(observation)。雖然DDPG主體提供可時變之神經網路，但經過訓練之後之DDPG主體，所輸出的動作(action)為非時變之控制力增益係數。另外，更在DDPG主體中加入LSTM並進行訓練，最後同樣得到輸出非時變增益係數之DDPG主體。之後，將訓練完DDPG主體進行頻率反應函數以及歷時分析之數值模擬，得到的隔震效果與直接輸出回饋設計方法相似，甚至在部分情況下優於直接輸出回饋設計方法，驗證強化學習DDPG能夠用於土木領域的主動控制問題。;This study proposes the application of the Deep Deterministic Policy Gradient (DDPG) from the field of machine learning, specifically reinforcement learning, to optimize the control of a single-degree-of-freedom active seismic isolation system. The results are analyzed and compared with the optimization results from traditional control theory′s Direct Output Feedback. Initially, the motion equations and state-space representations of the single-degree-of-freedom active seismic isolation system are derived. Subsequently, the active control force is designed based on the skyhook active isolation principle. Traditionally, the skyhook active isolation calculates the control force using absolute velocity feedback to achieve isolation. In this study, the control force is adjusted to use relative velocity and ground velocity feedback signals, enhancing the convenience of signal measurement and increasing the stability of the feedback signals. Consequently, the gain coefficients of the two feedback signals need to be optimized. This optimization problem can be addressed using the Direct Output Feedback from traditional control theory and the parameter updating iteration. The initial conditions involve an impulse ground acceleration, with the objective function being the minimization of absolute acceleration. This leads to the design of time-invariant optimal control force gain coefficients. Numerical simulations, including frequency response function and time history analysis, are conducted to understand the isolation effectiveness of the traditional optimization method. Additionally, this study employs a reinforcement learning approach by establishing an environment for the skyhook active isolation problem and using DDPG as the agent. Unlike the traditional time-invariant gain coefficients, DDPG utilizes a neural network that can vary over time and undergoes training and learning through the time history. To ensure a reasonable comparison with traditional control theory, the environment for reinforcement learning also uses initial conditions of an impulse ground acceleration, aiming to minimize the absolute acceleration response, which sets the reward function. The same feedback signals are used as observations. Although the DDPG agent provides a time-varying neural network, the control force gain coefficients outputted by the trained DDPG agent are time-invariant. Furthermore, LSTM is incorporated into the DDPG agent and trained, resulting in a DDPG agent that also outputs time-invariant gain coefficients. Finally, numerical simulations, including frequency response function and time history analysis, are conducted using the trained DDPG agent. The isolation effectiveness obtained is similar to or even better than the Direct Output Feedback design method in certain cases, verifying that the DDPG reinforcement learning method can be applied to active control problems in civil engineering.
Appears in Collections:	[Graduate Institute of Civil Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	19	View/Open

社群 sharing

Loading...