通過強化學習與積分滑模動量觀測器實現機器手臂的強健近佳PD控制策略

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.144.240.141

姓名

顏佑丞(You-Cheng Yan) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

通過強化學習與積分滑模動量觀測器實現機器手臂的強健近佳PD控制策略
(Robust Near-Optimal PD-like Control Strategy for Robot Manipulators via Reinforcement Learning and Integral Sliding-Mode Momentum Observer)

相關論文

★ 基於適應性徑向基神經網路與非奇異快速終端滑模控制結合線上延遲估測器應用於二軸機械臂運動軌跡精確控制	★ 新型三維光學影像量測系統之設計與控制
★ 新型雙紐線軌跡設計與進階控制實現壓電平台快速與精確定位	★ 基於深度座標卷積與自動編碼器給予行人實時路徑及終點位置精確預測
★ 修正式雙紐線軌跡結合自適應積分終端滑動模態控制與逆模型遲滯補償實現壓電平台精確追蹤	★ 以粒子群最佳化-倒傳遞類神經網路-比例積分微分控制器和影像金字塔轉換融合方法實現三維光學顯微影像系統
★ 以局部熵亂度分布與模板匹配方法結合自適應ORB特徵提取達成多影像精確拼接	★ 低扭矩機械手臂機構開發與脈寬調變進階控制器設計
★ 使用時域門控與梅森增益公式構建四埠夾具的散射參數表徵	★ 基於類代理注意力特徵融合模型的聯合實體關係抽取方法
★ 新型修正式柵欄軌跡結合擴增狀態估測滑模回授與多自由度Bouc-Wen遲滯前饋補償控制器給予壓電平台快速精確追蹤	★ 差速驅動輪式移動機器人輕量化設計與運動軌跡精確追蹤控制
★ 結合點雲密度熵計算方法和運動回復結構在虛幻引擎中進行影像三維點雲模型及渲染重建

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-1以後開放)

摘要(中)

機器手臂因為其高精度和持續性而被廣泛的使用在現今的工廠自動化產線，執行的任務常需要使安裝在末端的夾爪沿著預定義好的位置軌跡移動，然而在移動過程中難免會受到不確定性影響，導致移動精度下降。本篇論文為機器手臂的軌跡追蹤控制提出了一個控制策略，包含了一個不確定性估測器與一個基於強化學習的actor-critic最佳化追蹤控制器。首先，在已被投入商業應用的動量觀測器的基礎上，結合了積分滑模控制技術，除了繼承傳統動量觀測器的優點外，也擁有滑模控制的強健性，提升不確定性估測能力，並將估測值用於補償。其次，在強化學習追蹤控制理論下結合了傳統的PD加上前饋控制器，設計出一個神經網路參數選擇流程，此流程可避免耗時的神經網路活化函數與初始權重的調整，保證了初始控制器的可接受性，在控制過程中則利用強化學習的actor-critic架構來自適應調整控制輸出。該控制策略應用於機器手臂的閉迴路系統穩定性，已由Lyapunov方法證明所有誤差訊號都是有界的。為了驗證提出的控制策略的有效性與優越性，在二軸機器手臂的數值模擬中，與傳統的PD加上前饋控制器還有自適應RBF神經網路控制器做比較，結果顯示了提出的控制策略比其他兩者擁有更快的收斂速度與更小的穩態誤差。在真實二軸機器手臂上的實驗結果也證實了實務上的可行性。

摘要(英)

Robot manipulators are widely used in today’s factory automation production lines due to their high precision and consistency, which in turn improves productivity and quality. These tasks often require the end-effector mounted on the arm to move along predefined position trajectories. However, uncertainties during the movement can affect the precision, leading to decreased accuracy. This thesis proposes a control strategy for trajectory tracking control of robot manipulators, which includes an uncertainty estimator and a reinforcement learning-based actor-critic optimal tracking controller. First, building on the commercially applied momentum observer, we designed a momentum observer combined with integral sliding mode control. This observer not only inherits the advantages of the traditional momentum observer but also possesses the robustness of sliding mode control, enhancing uncertainty estimation capabilities and using the estimated values for compensation. Second, under the existing reinforcement learning tracking control theory, we integrated a traditional PD with feedforward controller and designed a neural network parameters selection procedure. This procedure avoid time-consuming adjustments of neural network activation functions and initial weights, ensuring the admissibility of the initial control policy. During the control period, the actor-critic architecture of reinforcement learning is used to adaptively adjust the control output. The closed-loop system stability has been proven by the Lyapunov method that all error signals are bounded. To verify the effectiveness and superiority of the proposed control strategy, it was compared with the traditional PD with feedforward controller and the adaptive RBF neural network controller in a two-link robot manipulator numerical simulation. The results showed that the proposed control strategy has a faster convergence speed and smaller steady-state error than the other two. Also, the practical feasibility has been confirmed through real-world experiments.

關鍵字(中)

★ 機器手臂
★ 強化學習
★ 動量觀測器
★ 軌跡追蹤控制
★ 最佳化控制

關鍵字(英)

★ Robot manipulator
★ Reinforcement learning
★ Momentum observer
★ Trajectory tracking control
★ Optimal control

論文目次

摘要 i
ABSTRACT ii
誌謝 iv
Table of Contents v
List of Figures vii
List of Tables viii
Explanation of Symbols ix
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Literature Survey 3
1.2.1 Reinforcement Learning-Based Optimal Control 4
1.2.2 Robot Control with Reinforcement Learning 7
1.2.3 Robot Control with Disturbance Observer 11
1.3 Contribution 15
1.4 Thesis Organization 17
Chapter 2 Preliminaries 19
2.1 Robot Dynamic and Control Objective 19
2.2 Integral Sliding Mode Control 22
2.3 RL Optimal Control Formulation 25
2.3.1 Optimal Tracking Control Problem 25
2.3.2 Policy Iteration and Value Function Approximation 27
Chapter 3 Observer Design 32
3.1 Generalized Momentum Formulation 32
3.2 Integral Sliding Mode MO Design 34
Chapter 4 Controller Design 40
4.1 Actor-Critic Control Design 40
4.2 Weight Update Laws 42
4.3 Stability Analysis 46
4.4 Design Procedure for NNs Parameters 51
Chapter 5 Simulation 57
5.1 Simulation Setup 57
5.2 Uncertainty Estimation Performance 59
5.3 Tracking Performance 67
5.3.1 Periodic Trajectory 68
5.3.2 Aperiodic Trajectory 73
5.4 NN Parameters Comparison 77
Chapter 6 Experiment 80
6.1 Configuration 81
6.2 Tracking Performance 82
Chapter 7 Conclusion 87
References 88

參考文獻

[1] R. S. Sutton, A. G. Barto, and R. J. Williams, ‘‘Reinforcement learning is direct adaptive optimal control,’’ IEEE Control Systems Magazine, vol. 12, no. 2, pp. 19-22, 1992.
[2] K. G. Vamvoudakis and F. L. Lewis, ‘‘Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem,’’ Automatica, vol. 46, no. 5, pp. 878-888, 2010.
[3] R. Kamalapurkar, H. Dinh, S. Bhasin, and W. E. Dixon, ‘‘Approximate optimal trajectory tracking for continuous-time nonlinear systems,’’ Automatica, vol. 51, pp. 40-48, 2015.
[4] F. L. Lewis and D. Vrabie, ‘‘Reinforcement learning and adaptive dynamic programming for feedback control,’’ IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009.
[5] H. Modares and F. L. Lewis, ‘‘Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning,’’ IEEE Transactions on Automatic Control, vol. 59, no. 11, pp. 3051-3056, 2014.
[6] Y. Jiang and Z. P. Jiang, ‘‘Robust adaptive dynamic programming and feedback stabilization of nonlinear systems,’’ IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 5, pp. 882-93, 2014.
[7] H. Modares, F. L. Lewis, and Z. P. Jiang, ‘‘H infinity tracking control of completely unknown continuous-time systems via off-policy reinforcement learning,’’ IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550-62, 2015.
[8] R. Song, F. L. Lewis, Q. Wei, and H. Zhang, ‘‘Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances,’’ IEEE Transactions on Cybernetics, vol. 46, no. 5, pp. 1041-50, 2016.
[9] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, ‘‘A novel actor–critic–identifier architecture for approximate optimal control of uncertain nonlinear systems,’’ Automatica, vol. 49, no. 1, pp. 82-92, 2013.
[10] R. Kamalapurkar, L. Andrews, P. Walters, and W. E. Dixon, ‘‘Model-Based Reinforcement Learning for Infinite-Horizon Approximate Optimal Tracking,’’ IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 753-758, 2017.
[11] M. L. Greene, Z. I. Bell, S. Nivison, and W. E. Dixon, ‘‘Deep Neural Network-Based Approximate Optimal Tracking for Unknown Nonlinear Systems,’’ IEEE Transactions on Automatic Control, vol. 68, no. 5, pp. 3171-3177, 2023.
[12] G. Wen, C. L. P. Chen, S. S. Ge, H. Yang, and X. Liu, ‘‘Optimized Adaptive Nonlinear Tracking Control Using Actor–Critic Reinforcement Learning Strategy,’’ IEEE Transactions on Industrial Informatics, vol. 15, no. 9, pp. 4969-4977, 2019.
[13] X. Yang, H. He, and D. Liu, ‘‘Event-Triggered Optimal Neuro-Controller Design With Reinforcement Learning for Unknown Nonlinear Systems,’’ IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 9, pp. 1866-1878, 2019.
[14] G. Wen, C. L. P. Chen, and S. S. Ge, ‘‘Simplified Optimized Backstepping Control for a Class of Nonlinear Strict-Feedback Systems With Unknown Dynamic Functions,’’ IEEE Transactions on Cybernetics, vol. 51, no. 9, pp. 4567-4580, 2021.
[15] Z. Li, J. Liu, Z. Huang, Y. Peng, H. Pu, and L. Ding, ‘‘Adaptive Impedance Control of Human–Robot Cooperation Using Reinforcement Learning,’’ IEEE Transactions on Industrial Electronics, vol. 64, no. 10, pp. 8013-8022, 2017.
[16] X. Liu, S. S. Ge, F. Zhao, and X. Mei, ‘‘Optimized Impedance Adaptation of Robot Manipulator Interacting With Unknown Environment,’’ IEEE Transactions on Control Systems Technology, vol. 29, no. 1, pp. 411-419, 2021.
[17] G. Peng, C. L. P. Chen, and C. Yang, ‘‘Neural Networks Enhanced Optimal Admittance Control of Robot-Environment Interaction Using Reinforcement Learning,’’ IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 9, pp. 4551-4561, 2022.
[18] W. He, H. Gao, C. Zhou, C. Yang, and Z. Li, ‘‘Reinforcement Learning Control of a Flexible Two-Link Manipulator: An Experimental Investigation,’’ IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 12, pp. 7326-7336, 2021.
[19] S. Baek, J. Baek, J. Choi, and S. Han, ‘‘A Reinforcement Learning-based Adaptive Time-Delay Control and Its Application to Robot Manipulators,’’ American Control Conference (ACC), Atlanta, GA, USA, 2022.
[20] A. Liu et al., ‘‘Reinforcement Learning Based Control for Uncertain Robotic Manipulator Trajectory Tracking,’’ China Automation Congress (CAC), 2022.
[21] H. Dong, X. Zhao, and B. Luo, ‘‘Optimal Tracking Control for Uncertain Nonlinear Systems With Prescribed Performance via Critic-Only ADP,’’ IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 561-573, 2022.
[22] S. Cao, L. Sun, J. Jiang, and Z. Zuo, ‘‘Reinforcement Learning-Based Fixed-Time Trajectory Tracking Control for Uncertain Robotic Manipulators With Input Saturation,’’ IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 8, pp. 4584-4595, 2023.
[23] E. Sariyildiz, H. Sekiguchi, T. Nozaki, B. Ugurlu, and K. Ohnishi, ‘‘A Stability Analysis for the Acceleration-Based Robust Position Control of Robot Manipulators via Disturbance Observer,’’ IEEE/ASME Transactions on Mechatronics, vol. 23, no. 5, pp. 2369-2378, 2018.
[24] B. Xiao, X. Yang, H. R. Karimi, and J. Qiu, ‘‘Asymptotic Tracking Control for a More Representative Class of Uncertain Nonlinear Systems With Mismatched Uncertainties,’’ IEEE Transactions on Industrial Electronics, vol. 66, no. 12, pp. 9417-9427, 2019.
[25] Z. Zhang, M. Leibold, and D. Wollherr, ‘‘Integral Sliding-Mode Observer-Based Disturbance Estimation for Euler–Lagrangian Systems,’’ IEEE Transactions on Control Systems Technology, vol. 28, no. 6, pp. 2377-2389, 2020.
[26] S. Haddadin, A. De Luca, and A. Albu-Schaffer, ‘‘Robot Collisions: A Survey on Detection, Isolation, and Identification,’’ IEEE Transactions on Robotics, vol. 33, no. 6, pp. 1292-1312, 2017.
[27] G. Peng, C. Yang, W. He, and C. L. P. Chen, ‘‘Force Sensorless Admittance Control With Neural Learning for Robots With Actuator Saturation,’’ IEEE Transactions on Industrial Electronics, vol. 67, no. 4, pp. 3138-3148, 2020.
[28] A. Wahrburg, E. Morara, G. Cesari, B. Matthias, and H. Ding, ‘‘Cartesian contact force estimation for robotic manipulators using Kalman filters and the generalized momentum,’’ IEEE International Conference on Automation Science and Engineering (CASE), Gothenburg, Sweden, 24-28 August, 2015.
[29] C. Yang, G. Peng, L. Cheng, J. Na, and Z. Li, ‘‘Force Sensorless Admittance Control for Teleoperation of Uncertain Robot Manipulator Using Neural Networks,’’ IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 5, pp. 3282-3292, 2021.
[30] J. Na, B. Jing, Y. Huang, G. Gao, and C. Zhang, ‘‘Unknown System Dynamics Estimator for Motion Control of Nonlinear Robotic Systems,’’ IEEE Transactions on Industrial Electronics, vol. 67, no. 5, pp. 3850-3859, 2020.
[31] G. Garofalo, N. Mansfeld, J. Jankowski, and C. Ott, ‘‘Sliding Mode Momentum Observers for Estimation of External Torques and Joint Acceleration,’’ International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20-24 May, 2019.
[32] S. K. Kommuri, S. Han, and S. Lee, ‘‘External Torque Estimation Using Higher Order Sliding-Mode Observer for Robot Manipulators,’’ IEEE/ASME Transactions on Mechatronics, vol. 27, no. 1, pp. 513-523, 2022.
[33] F. L. Lewis, D. M. Dawson, and C. T. Abdallah, Robot Manipulator Control: Theory and Practice, 2nd ed. CRC Press, 2003.
[34] B. Xiao, L. Cao, S. Xu, and L. Liu, ‘‘Robust Tracking Control of Robot Manipulators With Actuator Faults and Joint Velocity Measurement Uncertainty,’’ IEEE/ASME Transactions on Mechatronics, vol. 25, no. 3, pp. 1354-1365, 2020.
[35] J. Swevers, W. Verdonck, and J. D. Schutter, ‘‘Dynamic Model Identification for Industrial Robots,’’ IEEE Control Systems Magazine, vol. 27, no. 5, pp. 58-71, 2007.
[36] Y. Han, J. Wu, C. Liu, and Z. Xiong, ‘‘An Iterative Approach for Accurate Dynamic Model Identification of Industrial Robots,’’ IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1577-1594, 2020.
[37] M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and Control, 2nd ed. John Wiley & Sons, 2020.
[38] R. Kelly and R. Salgado, ‘‘PD control with computed feedforward of robot manipulators: a design procedure,’’ IEEE Transactions on Robotics and Automation, vol. 10, no. 4, pp. 566-571, 1994.
[39] V. Utkin, J. Guldner, and J. Shi, Sliding Mode Control in Electro-Mechanical Systems, 2nd ed. Boca Raton, FL, USA: CRC press, 2009.
[40] Y. Pan, C. Yang, L. Pan, and H. Yu, ‘‘Integral Sliding Mode Control: Performance, Modification, and Improvement,’’ IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3087-3096, 2018.
[41] R. Kamalapurkar, P. Walters, J. Rosenfeld, and W. Dixon, Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Berlin, Germany: Springer, 2018.
[42] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control, 3rd ed. NewYork, NY, USA: Wiley, 2012.
[43] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[44] P. A. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ, USA: PTR Prentice-Hall, 1996.
[45] H. K. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2002.
[46] V. Santibañez and R. Kelly, ‘‘PD control with feedforward compensation for robot manipulators: analysis and experimentation,’’ Robotica, vol. 19, no. 1, pp. 11-19, 2001.

指導教授

吳俊緯(Jim-Wei Wu)

審核日期

2024-7-23

推文