基於事後近端策略優化的深度強化學習機械手臂控制;Hindsight Proximal Policy Optimization based Deep Reinforcement Learning Manipulator Control

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/93255

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93255

題名:	基於事後近端策略優化的深度強化學習機械手臂控制;Hindsight Proximal Policy Optimization based Deep Reinforcement Learning Manipulator Control
作者:	蘇聖哲;Su, Sheng-Che
貢獻者:	資訊工程學系
關鍵詞:	機械手臂;深度強化學習;事後近端策略優化;機器人控制
日期:	2023-07-25
上傳時間:	2024-09-19 16:50:48 (UTC+8)
出版者:	國立中央大學
摘要:	現今工廠智慧自動化的需求日漸增加，傳統的機械手臂在工廠中執行簡單自動化模式工作，深度強化學習則能夠能讓機械手臂執行更複雜的工作。在機器人領域的深度強化學習經常要面對困難的學習任務，在三維且連續的環境中，使得機器人難以獲得獎勵，這種環境稱為稀疏獎勵環境。為了克服此一問題，本研究提出了基於事後近端策略優化的深度強化學習(HPPO，Hindsight Proximal Policy Optimization)方法，用於機械手臂智慧控制。該方法結合了PPO(Proximal Policy Optimization)算法和HER(Hindsight Experience Replay)的想法，提升PPO在稀疏獎勵環境的適應性和樣本使用率。不同於傳統強化學習架構，我們採用Multi-goal概念，使Agent在與環境互動時有明確的目標，並且參考HER算法中的假資料生成，使Agent能夠從失敗中學習，進而更快達成目標。我們在機械手臂控制的模擬環境中進行了一系列實驗，並與其他深度強化學習進行比較，實驗結果表明以PPO作為核心算法改良的HPPO效果有顯著的提升，HPPO在稀疏獎勵環境中適應性較佳，並提高了樣本使用率，使訓練效率提升，驗證了HPPO使用於機械手臂的實用性，並且能以此方法為基礎應用於多種機器人的控制應用。;The demand for intelligent automation in factories has been increasing, with traditional manipulator performing simple automation tasks. Deep reinforcement learning enables manipulator to handle more complex tasks. However, the field of robotics faces challenging learning tasks, particularly in sparse reward environments, where robots struggle to obtain rewards. To overcome this issue, this study proposes a method called Hindsight Proximal Policy Optimization (HPPO) based on proximal policy optimization (PPO) and the idea of Hindsight Experience Replay (HER) for intelligent control of robotic arms. HPPO combines the PPO algorithm with the concept of multi-goal reinforcement learning, providing the agent with explicit goals during interactions with the environment. Additionally, it leverages the generation of fictitious data from HER to enable the agent to learn from failures and achieve goals more efficiently. A series of experiments were conducted in a simulated environment for robotic arm control, comparing HPPO with other deep reinforcement learning methods. The results demonstrate significant improvements in HPPO, which exhibits better adaptability and sample utilization in sparse reward environments. The training efficiency is enhanced, validating the practicality of HPPO for robotic arm control and its potential application in various robot control scenarios.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	56	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....