機器手臂因為其高精度和持續性而被廣泛的使用在現今的工廠自動化產線,執行的任務常需要使安裝在末端的夾爪沿著預定義好的位置軌跡移動,然而在移動過程中難免會受到不確定性影響,導致移動精度下降。本篇論文為機器手臂的軌跡追蹤控制提出了一個控制策略,包含了一個不確定性估測器與一個基於強化學習的actor-critic最佳化追蹤控制器。首先,在已被投入商業應用的動量觀測器的基礎上,結合了積分滑模控制技術,除了繼承傳統動量觀測器的優點外,也擁有滑模控制的強健性,提升不確定性估測能力,並將估測值用於補償。其次,在強化學習追蹤控制理論下結合了傳統的PD加上前饋控制器,設計出一個神經網路參數選擇流程,此流程可避免耗時的神經網路活化函數與初始權重的調整,保證了初始控制器的可接受性,在控制過程中則利用強化學習的actor-critic架構來自適應調整控制輸出。該控制策略應用於機器手臂的閉迴路系統穩定性,已由Lyapunov方法證明所有誤差訊號都是有界的。為了驗證提出的控制策略的有效性與優越性,在二軸機器手臂的數值模擬中,與傳統的PD加上前饋控制器還有自適應RBF神經網路控制器做比較,結果顯示了提出的控制策略比其他兩者擁有更快的收斂速度與更小的穩態誤差。在真實二軸機器手臂上的實驗結果也證實了實務上的可行性。;Robot manipulators are widely used in today’s factory automation production lines due to their high precision and consistency, which in turn improves productivity and quality. These tasks often require the end-effector mounted on the arm to move along predefined position trajectories. However, uncertainties during the movement can affect the precision, leading to decreased accuracy. This thesis proposes a control strategy for trajectory tracking control of robot manipulators, which includes an uncertainty estimator and a reinforcement learning-based actor-critic optimal tracking controller. First, building on the commercially applied momentum observer, we designed a momentum observer combined with integral sliding mode control. This observer not only inherits the advantages of the traditional momentum observer but also possesses the robustness of sliding mode control, enhancing uncertainty estimation capabilities and using the estimated values for compensation. Second, under the existing reinforcement learning tracking control theory, we integrated a traditional PD with feedforward controller and designed a neural network parameters selection procedure. This procedure avoid time-consuming adjustments of neural network activation functions and initial weights, ensuring the admissibility of the initial control policy. During the control period, the actor-critic architecture of reinforcement learning is used to adaptively adjust the control output. The closed-loop system stability has been proven by the Lyapunov method that all error signals are bounded. To verify the effectiveness and superiority of the proposed control strategy, it was compared with the traditional PD with feedforward controller and the adaptive RBF neural network controller in a two-link robot manipulator numerical simulation. The results showed that the proposed control strategy has a faster convergence speed and smaller steady-state error than the other two. Also, the practical feasibility has been confirmed through real-world experiments.