天鉤主動隔震系統應用強化學習DDPG與直接輸出回饋之最佳化設計與分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：14

、訪客IP：18.224.59.215

姓名

洪憲証(Xian-Zheng Hong) 查詢紙本館藏

畢業系所

土木工程學系

論文名稱

天鉤主動隔震系統應用強化學習DDPG與直接輸出回饋之最佳化設計與分析

相關論文

★ 主動式相位控制調諧質量阻尼器之研發與實驗驗證	★ 相位控制之主動調諧質量阻尼器應用於多自由度構架分析與實驗驗證
★ 懸臂梁形式壓電調諧質量阻尼器之研發與最佳化設計	★ 天鉤主動隔震系統應用於單自由度機構分析與實驗驗證
★ 天鉤主動隔震系統應用於非剛體設備物之分析與實驗驗證	★ 以直接輸出回饋與參數更新迭代方法設計最佳化被動調諧質量阻尼器與多元調諧質量阻尼器
★ 考慮即時濾波與衝程限制之相位控制主動調諧質量阻尼器應用於多自由度構架分析與實驗驗證	★ 懸臂梁形式壓電調諧質量阻尼器多自由度分析與最佳化設計之減振與能量擷取研究
★ 設備物應用衝程考量天鉤主動隔震系統之數值模擬分析及實驗驗證	★ 變斷面懸臂梁形式多元壓電調諧質量阻尼器於結構減振與能量擷取之最佳化設計與參數識別
★ 考慮Kanai-Tajimi濾波器以直接輸出回饋進行隔震層阻尼係數之最佳化設計	★ 相位控制主動調諧質量阻尼器於非線性 Bouc-Wen Model 結構之分析
★ 具凸面導軌之雙向偏心滾動隔震系統機構開發與試驗驗證	★ 雙向天鉤主動隔震系統之數值模擬分析及實驗驗證
★ 相位控制多元主動調諧質量阻尼器於結構減震性能評估之數值模擬分析	★ 倒擺懸臂梁形式多元壓電調諧質量阻尼器於結構減振與能量擷取之分析與實驗驗證

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-6-30以後開放)

摘要(中)

本研究提出以機器學習領域中強化學習(Reinforcement Learning)之DDPG (Deep Deterministic Policy Gradient)，應用於單自由度主動隔震系統之最佳化控制，並與傳統控制理論之直接輸出回饋(Direct Output Feedback)之最佳化結果，進行分析與比較。文中首先推導單自由度主動隔震系統之運動方程式與狀態空間表示式，之後以天鉤主動隔震原理設計主動控制力。天鉤主動隔震原是以絕對速度回饋訊號計算控制力以達隔震效果，而本研究將此控制力進行調整，改以相對速度以及地表速度回饋之訊號進行控制力計算，如此能夠提升訊號量測的便利性之外，更能增加回饋訊號的穩定，也因此需最佳化設計兩個回饋訊號之增益係數。此最佳化問題可透過傳統控制領域之直接輸出回饋以及參數更新迭代法，針對一脈衝地表加速度之初始條件，以絕對加速度最小化為目標函數，設計出非時變之最佳控制力增益係數，並進行頻率反應函數及歷時分析之數值模擬，以了解傳統最佳化方式之隔震效果。此外，本研究另外使用強化學習方法，透過建立天鉤主動隔震問題的環境以及使用DDPG作為主體，將傳統控制領域中非時變之增益係數，改為一可時變之神經網路，與歷時中進行訓練與學習。其中，為與傳統控制理論結果合理比較，此強化學習的訓練環境亦以一脈衝地表加速度下初始條件，希望獲得最小絕對加速度反應，以此設定獎勵函數(reward)，並使用相同的回饋訊號作為觀測值(observation)。雖然DDPG主體提供可時變之神經網路，但經過訓練之後之DDPG主體，所輸出的動作(action)為非時變之控制力增益係數。另外，更在DDPG主體中加入LSTM並進行訓練，最後同樣得到輸出非時變增益係數之DDPG主體。之後，將訓練完DDPG主體進行頻率反應函數以及歷時分析之數值模擬，得到的隔震效果與直接輸出回饋設計方法相似，甚至在部分情況下優於直接輸出回饋設計方法，驗證強化學習DDPG能夠用於土木領域的主動控制問題。

摘要(英)

This study proposes the application of the Deep Deterministic Policy Gradient (DDPG) from the field of machine learning, specifically reinforcement learning, to optimize the control of a single-degree-of-freedom active seismic isolation system. The results are analyzed and compared with the optimization results from traditional control theory′s Direct Output Feedback. Initially, the motion equations and state-space representations of the single-degree-of-freedom active seismic isolation system are derived. Subsequently, the active control force is designed based on the skyhook active isolation principle. Traditionally, the skyhook active isolation calculates the control force using absolute velocity feedback to achieve isolation. In this study, the control force is adjusted to use relative velocity and ground velocity feedback signals, enhancing the convenience of signal measurement and increasing the stability of the feedback signals. Consequently, the gain coefficients of the two feedback signals need to be optimized.
This optimization problem can be addressed using the Direct Output Feedback from traditional control theory and the parameter updating iteration. The initial conditions involve an impulse ground acceleration, with the objective function being the minimization of absolute acceleration. This leads to the design of time-invariant optimal control force gain coefficients. Numerical simulations, including frequency response function and time history analysis, are conducted to understand the isolation effectiveness of the traditional optimization method. Additionally, this study employs a reinforcement learning approach by establishing an environment for the skyhook active isolation problem and using DDPG as the agent. Unlike the traditional time-invariant gain coefficients, DDPG utilizes a neural network that can vary over time and undergoes training and learning through the time history. To ensure a reasonable comparison with traditional control theory, the environment for reinforcement learning also uses initial conditions of an impulse ground acceleration, aiming to minimize the absolute acceleration response, which sets the reward function. The same feedback signals are used as observations. Although the DDPG agent provides a time-varying neural network, the control force gain coefficients outputted by the trained DDPG agent are time-invariant. Furthermore, LSTM is incorporated into the DDPG agent and trained, resulting in a DDPG agent that also outputs time-invariant gain coefficients. Finally, numerical simulations, including frequency response function and time history analysis, are conducted using the trained DDPG agent. The isolation effectiveness obtained is similar to or even better than the Direct Output Feedback design method in certain cases, verifying that the DDPG reinforcement learning method can be applied to active control problems in civil engineering.

關鍵字(中)

★ 主動隔震
★ 天鉤阻尼
★ 直接輸出回饋
★ 參數更新迭代
★ 強化學習
★ DDPG
★ LSTM

關鍵字(英)

★ Active seismic isolation
★ Skyhook damping
★ Direct Output Feedback
★ Parameter updating iteration
★ reinforcement learning
★ DDPG
★ LSTM

論文目次

摘要 i
Abstract ii
目錄 iv
表目錄 vii
圖目錄 x
符號說明 xvi
第一章緒論 1
1-1 研究動機 1
1-2 文獻回顧 2
1-2-1 隔震系統 2
1-2-2 結構控制理論發展 4
1-2-3 強化學習 4
1-2-4 強化學習DDPG演算法 6
1-3 研究內容 7
第二章天鉤主動隔震之控制理論研究與數值模擬 8
2-1 天鉤控制理論 8
2-2 天鉤隔震系統方程式 9
2-3 天鉤隔震系統之最佳化設計 12
2-3-1 天鉤隔震系統之最佳化設計狀態方程式 13
2-3-2 天鉤隔震之直接輸出回饋設計方法 14
2-4 天鉤隔震系統之數值模擬 18
2-4-1 控制力增益係數之穩定性分析 19
2-4-2 頻率反應函數 21
2-4-3 給定初始條件之主動隔震系統控制歷時反應數值模擬 23
2-4-4 地震歷時反應數值模擬 25
2-4-5 增益係數敏感度分析 30
2-4-6 海森矩陣 32
第三章天鉤主動隔震之強化學習方法與數值模擬 64
3-1 強化學習介紹 64
3-1-1 環境介紹 64
3-1-2 主體介紹 65
3-2 強化學習應用於天鉤主動隔震 69
3-2-1 環境建立 69
3-2-2 主體建立 71
3-3 強化學習之增益係數設計 75
3-3-1 電腦配置 76
3-3-2 神經網路不同寬度之設計 76
3-3-3 神經網路不同深度之設計 79
3-3-4 LSTM神經網路之設計 82
3-4 強化學習之控制力設計 85
3-5 強化學習之數值模擬 86
3-5-1 頻率反應函數 87
3-5-2 地震歷時反應數值模擬 88
第四章控制理論設計與強化學習設計比較 141
4-1 數值模擬之頻率反應函數結果比較 141
4-2 數值模擬之歷時反應數值模擬 142
4-2-1 給定初始條件之主動隔震系統控制歷時反應數值模擬 142
4-2-2 地震歷時反應數值模擬 143
第五章結論與建議 166
5-1 結論 166
5-2 建議 169
參考文獻 171
附錄A 176
附錄B 178

參考文獻

[1] 黃振興、黃尹男、黃仁傑，「微電子廠採用隔震設計之可行性試驗研究」，國家地震工程研究中心，編號NCREE 02-023，2002年。
[2] 栗正暐、黃宣諭，「高科技半導體廠結構設計之關鍵考量」，土木水利，第四十六卷，第六期，社團法人中國土木水利工程學會，30-37頁。
[3] Zhang Y.A. and Zhu A., “Novel Model-free Optimal Active Vibration Control Strategy Based on Deep Reinforcement Learning”, Structural Control and Health Monitoring, 1, 6770137, 2023.
[4] 李柏陞，「基於深度強化學習之無人載具對未知環境的路徑規劃」，淡江大學，碩士論文，2023。
[5] 邱唯祐，「電腦斷層血管攝影影像使用深度強化學習追蹤冠狀動脈中心線」，國立陽明交通大學，碩士論文，2023。
[6] Kelly J.M., Earthquake-Resistant Design with Rubber, Springer-Verlag, 1993.
[7] Naeim F., & Kelly J.M., Design of Seismic Isolated Structures: From Theory to Practice, 1999.
[8] Tsai C.S., Chiang T.C., Chen B.J. and Lin S.B., “An advanced analytical model for high damping rubber bearings”, Earthquake Engineering & Structural Dynamics, 32, pp. 1373-1387, 2003.
[9] Reggio A. and Angelis M.D., “Combined primary-secondary system approach to the design of an equipment isolation system with High-Damping Rubber Bearings”, Journal of Sound and Vibration, 333, pp. 2386-2403, 2014.
[10] Yang J.N., Agrawal A.K., and Samali B., “A Benchmark Problem for Response Control of Wind-Excited Tall Buildings”, Journal of Engineering Mechanics, 130, pp. 437-446, 2004.
[11] Rodellar J., Garcia G., Vidal Y., Acho L. and Pozo F., “Hysteresis based vibration control of base-isolated structures”, Procedia Engineering, 199, pp. 1798-1803, 2017.
[12] Nagarajaiah S., Riley M.A., Reinhorn A.M., “Control of sliding isolated bridge with absolute acceleration feedback”, Journal of Engineering Mechanics, 119, pp. 2317-2332, 1993.
[13] Venanzi I., Ierimonti L. and Materazzi A.L., “Active Base Isolation of Museum Artifacts under Seismic Excitation”, Journal of Earthquake Engineering, 24, pp. 506-527, 2020.
[14] Oh H.E., Ku J.M., Lee D.H., Hong C.S. and Jeong W.B., “Analysis for active isolation of the equipment on flexible beam structure”, Journal of Physics: Conf. Series, 1075, pp. 27-28, 2017.
[15] 陳佳恩，「天鉤主動隔震系統應用於單自由度機構分析與實驗驗證」，國立中央大學，碩士論文，2021年。
[16] 吳柏諺，「天鉤主動隔震系統應用於非剛體設備物之分析與實驗驗證」，國立中央大學，碩士論文，2022年。
[17] 蔡元峰，「設備物應用衝程考量天鉤主動隔震系統之數值模擬分析及實驗驗證」，國立中央大學，碩士論文，2022年。
[18] Basili M. and Angelis M.D., “Investigation on the optimal properties of semi active control devices with continuous control for equipment isolation”, Scalable Computing: Practice and Experience, 15, pp. 331-343, 2015.
[19] Housner G.W., Bergman L.A., Caughey T.K., Chassiakos A.G., Claus R.O., Masri S.F., and Yao J.T.P., “Structural Control: Past, Present, and Future”, Journal of Engineering Mechanics, 123, pp. 897-971, 1997.
[20] Yoshioka H., Ramallo J.C. and Spencer B.F., ““Smart” Base Isolation Strategies Employing Magnetorheological Dampers”, Journal of Engineering Mechanics, 128, pp. 540-551, 2002.
[21] Smith O.J.M., Feedback Control Systems, McGraw-Hill, 1958.
[22] Astrom K.J. and Hagglund T., Advanced PID Control, Automation, 2006.
[23] Bryson A.E. and Ho Y.C., Applied Optimal Control: Optimization, Estimation, and Control, Blaisdell Publishing Company, 1969.
[24] Glover K., Doyle J. and Packard A., “State-space formulae for all stabilizing controllers that satisfy an H-infinity-norm bound and relations to risk sensitivity”, Systems & Control Letters, 14, pp. 165-172, 1989.
[25] Watkins C.J.C.H. and Dayan, P., “Q-Learning”, Machine Learning, 8, pp. 279-292, 1992.
[26] Sutton R.S. and Barto A.G., Reinforcement Learning: An Introduction, The MIT Press, 1998.
[27] Mnih V., et al., “Human-level control through deep reinforcement learning”, Nature, 518, pp. 529-533, 2015.
[28] Lillicrap T.P., et al., “Continuous control with deep reinforcement learning”, ICLR, arXiv:1509.02971, 2016.
[29] Mnih V., et al., “Asynchronous methods for deep reinforcement learning”, International Conference on Machine Learning, 48, pp. 1928-1937, 2016.
[30] Silver D., et al., “Mastering the game of Go with deep neural networks and tree search”, Nature, 529, pp. 484-489, 2016.
[31] Silver D., et al., “Mastering the game of Go without human knowledge”, Nature, 550, pp. 354-359, 2017.
[32] Haarnoja T., et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”, International Conference on Machine Learning, 35, pp. 1861-1870, 2018.
[33] 張承熹，「LQR和強化學習倒單擺控制之比較」，中國文化大學，碩士論文，2022。
[34] Greguri´c M., Vuji´c M., Alexopoulos C. and Mileti´c M., “Application of Deep Reinforcement Learning in Traffic Signal Control: An Overview and Impact of Open Traffic Data”, Applied Sciences., 10, 4011, 2020.
[35] 范淳皓，「基於深度強化學習神經網路之自動化裂縫分割與偵測」，國立台灣大學，碩士論文，2024。
[36] Xu H., Su X., Wang Y., Cai H., Cui K., and Chen X., “Automatic bridge crack detection using a convolutional neural network”, Applied Sciences, 9, 2867, 2019.
[37] 陳克宜，「應用深度強化學習於地震特性控制模組與壓電式智能滑動隔減震系統之研發」，國立陽明交通大學，碩士論文，2022。
[38] Eshkevari S.S., Eshkevari S.S., Sen D., Pakzad S.N., “Active structural control framework using policy-gradient reinforcement learning”, Engineering Structures, 274, pp.115-122, 2023.
[39] 莊竣凱，「應用長短期記憶神經網路於地震特性預測模組與智慧型隔減震控制系統之研發與實驗驗證」，國立陽明交通大學，碩士論文，2021。
[40] Yao J. and Ge Z., “Path-Tracking Control Strategy of Unmanned Vehicle Based on DDPG Algorithm”, Sensors, 22, 7881, 2022.
[41] Kang J.W. and Kim H.S., “Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD”, Journal of Korean Association for Spatial Structures, 21, pp.41-48, 2021.
[42] Liang G., Zhao T. and Wei Y., “DDPG based self-learning active and model-constrained semi-active suspension control”, CVCI, pp. 1-6, 2021.
[43] Yang J., Peng W. and Sun C., “A Learning Control Method of Automated Vehicle Platoon at Straight Path with DDPG-Based PID”, Electronics, 10, 2580, 2021.
[44] Yan R., Jiang R., Jia B., Huang J., and Yang D., “Hybrid Car-Following Strategy Based on Deep Deterministic Policy Gradient and Cooperative Adaptive Cruise Control”, Automation Science and Engineering, arXiv:2103.03796, 2022.
[45] Jonathan J. Hunt, et al. , “Continuous control with deep reinforcement learning”, ICLR, arXiv:1509.02971, 2016.
[46] Andrew G. Howard, Menglong Zhu and Bo Chen, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”, Computer Vision and Pattern Recognition, arXiv:1704.04861, 2017.
[47] He K., Zhang X., Ren S. and Sun J., “Deep Residual Learning for Image Recognition”, Computer Vision and Pattern Recognition, arXiv:1512.03385, 2015.
[48] Bruin T., Kober J., Tuyls K. and Babuˇska R., “Experience Selection in Deep Reinforcement Learning for Control”, Journal of Machine Learning Research, 19, pp. 1-56, 2018.
[49] Kingma D.P. and Ba J.L., “Adam: A Method for Stochastic Optimization”, ICLR, arXiv:1412.6980, 2015.

指導教授

賴勇安(Yong-An Lai)

審核日期

2024-7-29

推文