dc.description.abstract | This study proposes the application of the Deep Deterministic Policy Gradient (DDPG) from the field of machine learning, specifically reinforcement learning, to optimize the control of a single-degree-of-freedom active seismic isolation system. The results are analyzed and compared with the optimization results from traditional control theory′s Direct Output Feedback. Initially, the motion equations and state-space representations of the single-degree-of-freedom active seismic isolation system are derived. Subsequently, the active control force is designed based on the skyhook active isolation principle. Traditionally, the skyhook active isolation calculates the control force using absolute velocity feedback to achieve isolation. In this study, the control force is adjusted to use relative velocity and ground velocity feedback signals, enhancing the convenience of signal measurement and increasing the stability of the feedback signals. Consequently, the gain coefficients of the two feedback signals need to be optimized.
This optimization problem can be addressed using the Direct Output Feedback from traditional control theory and the parameter updating iteration. The initial conditions involve an impulse ground acceleration, with the objective function being the minimization of absolute acceleration. This leads to the design of time-invariant optimal control force gain coefficients. Numerical simulations, including frequency response function and time history analysis, are conducted to understand the isolation effectiveness of the traditional optimization method. Additionally, this study employs a reinforcement learning approach by establishing an environment for the skyhook active isolation problem and using DDPG as the agent. Unlike the traditional time-invariant gain coefficients, DDPG utilizes a neural network that can vary over time and undergoes training and learning through the time history. To ensure a reasonable comparison with traditional control theory, the environment for reinforcement learning also uses initial conditions of an impulse ground acceleration, aiming to minimize the absolute acceleration response, which sets the reward function. The same feedback signals are used as observations. Although the DDPG agent provides a time-varying neural network, the control force gain coefficients outputted by the trained DDPG agent are time-invariant. Furthermore, LSTM is incorporated into the DDPG agent and trained, resulting in a DDPG agent that also outputs time-invariant gain coefficients. Finally, numerical simulations, including frequency response function and time history analysis, are conducted using the trained DDPG agent. The isolation effectiveness obtained is similar to or even better than the Direct Output Feedback design method in certain cases, verifying that the DDPG reinforcement learning method can be applied to active control problems in civil engineering. | en_US |