應用強化學習以機器手臂進行軸孔餘隙配合裝配任務;Application of Reinforcement Learning to Robotic Arm Assembly for Shaft–Hole Clearance Fit Tasks

NCU Institutional Repository > 工學院 > 光機電工程研究所 > 博碩士論文 > Item 987654321/99270

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99270

題名:	應用強化學習以機器手臂進行軸孔餘隙配合裝配任務;Application of Reinforcement Learning to Robotic Arm Assembly for Shaft–Hole Clearance Fit Tasks
作者:	何佳修;Ho, Chia-Hsiu
貢獻者:	光機電工程研究所
關鍵詞:	強化學習;阻抗控制;力覺回饋;七軸機器手臂;軸孔裝配;Sim2Real;Reinforcement Learning;Peg-in-Hole;PPO;Impedance Control;Force Feedback;Sim2Real
日期:	2026-01-27
上傳時間:	2026-03-06 18:29:13 (UTC+8)
出版者:	國立中央大學
摘要:	本研究探討強化學習（Reinforcement Learning, RL）於組裝任務的應用，實驗平台為七軸機械手臂，以軸孔裝配（Peg-in-Hole）任務作為實驗場景。系統整合視覺與力覺感測器，其中視覺模組負責辨識孔位位置，力覺回饋則協助接觸判斷與阻抗控制，使手臂在組裝過程中具備順應性與穩定操作能力。控制策略以笛卡兒阻抗控制（Cartesian Impedance Control）為基礎，並透過強化學習演算法進行策略優化。本研究比較三種常用演算法——近端策略最佳化（Proximal Policy Optimization, PPO）、深度確定性策略梯度（Deep Deterministic Policy Gradient, DDPG）及軟性行為者評論家（Soft Actor-Critic, SAC）——以評估其在連續控制任務中的學習效率、穩定性與泛化能力。所有訓練皆於 NVIDIA Isaac Sim 模擬環境完成，並建立數位雙生（Digital Twin）模型，完整模擬機器手臂、工件及作業環境的動態與交互關係，確保策略學習可反映實際操作特性。訓練後模型再部署至實體手臂進行 Sim2Real 驗證。實驗結果顯示，PPO 與 DDPG 模型均能成功學習組裝策略，並具備一定的 Sim2Real 遷移能力，其中 PPO 於實體驗證中成功率約為 90%，DDPG 約為 75%。相較之下，SAC 模型在模擬階段未能穩定收斂，雖偶爾能完成插入動作，但學習過程不穩定；於實體測試中完全無法執行成功組裝（成功率為 0%）。此外，在不同手臂剛度設定下的測試結果顯示，PPO 與 DDPG 模型均具良好泛化能力，能在不同孔洞位置與插件初始角度條件下維持穩定性能。研究結果證實，結合強化學習、阻抗控制與數位雙生模擬的 Sim2Real 流程，能有效提升七軸機械手臂在軸孔裝配任務中的精度、穩定性與自適應能力，並降低實體試誤學習所帶來的風險與成本，對自動化組裝與智慧製造之實務應用具有重要參考價值。;This study investigates the application of Reinforcement Learning (RL) in assembly tasks, using a 7-DOF robotic arm as the experimental platform and the Peg-in-Hole assembly task as the test scenario. The system integrates visual and force sensing, where the vision module identifies hole positions and force feedback assists contact detection and impedance control, enabling compliant and stable manipulation during assembly. The control strategy is based on Cartesian Impedance Control and optimized through RL algorithms. Three widely used algorithms—Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), and Soft Actor-Critic (SAC)—were compared to evaluate their learning efficiency, task stability, and policy generalization in continuous control tasks. All training was performed in the NVIDIA Isaac Sim environment, where a Digital Twin fully simulates the dynamics and interactions between the robotic arm, the workpiece, and the operational environment. The trained models were then deployed on the physical robot for Sim2Real validation. Experimental results show that both PPO and DDPG successfully learned assembly strategies and demonstrated certain Sim2Real transfer capabilities. PPO achieved approximately 90% success in physical validation, while DDPG reached about 75%. In contrast, the SAC model failed to converge stably in simulation and completely failed to perform successful assembly in real-world tests (0% success). Additionally, the experimental results under different arm stiffness settings demonstrate that both the PPO and DDPG models exhibit strong generalization capabilities, maintaining stable performance across varying hole positions and different initial peg orientation conditions. These results confirm that a Sim2Real framework combining Reinforcement Learning, Impedance Control, and Digital Twin simulation can effectively improve accuracy, stability, and adaptability of a 7-DOF robotic arm in Peg-in-Hole assembly tasks, while reducing risks and costs associated with real-world trial-and-error learning and providing practical guidance for automated assembly and smart manufacturing applications.
顯示於類別:	[光機電工程研究所 ] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	7	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....