使用強化學習之動作同步遠端操控人型機器人

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：111

、訪客IP：18.118.193.5

姓名

吳誌銘(Jhih-Ming Wu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

使用強化學習之動作同步遠端操控人型機器人
(A reinforcement learning based motion tracking approach for remote humanoid robot manipulation)

相關論文

★ 使用梳狀濾波器於相位編碼之穩態視覺誘發電位腦波人機介面	★ 應用電激發光元件於穩態視覺誘發電位之腦波人機介面判斷
★ 智慧型手機之即時生理顯示裝置研製	★ 多頻相位編碼之閃光視覺誘發電位驅動大腦人機介面
★ 以經驗模態分解法分析穩態視覺誘發電位之大腦人機界面	★ 利用經驗模態分解法萃取聽覺誘發腦磁波訊號
★ 明暗閃爍視覺誘發電位於遙控器之應用	★ 使用整體經驗模態分解法進行穩態視覺誘發電位腦波遙控車即時控制
★ 使用模糊理論於穩態視覺誘發之腦波人機介面判斷	★ 利用正向模型設計空間濾波器應用於視覺誘發電位之大腦人機介面之雜訊消除
★ 智慧型心電圖遠端監控系統	★ 使用隱馬可夫模型於穩態視覺誘發之腦波人機介面判斷與其腦波控制遙控車應用
★ 使用類神經網路於肢體肌電訊號進行人體關節角度預測	★ 使用等階集合法與影像不均勻度修正於手指靜脈血管影像切割
★ 應用小波編碼於多通道生理訊號傳輸	★ 結合高斯混合模型與最大期望值方法於相位編碼視覺腦波人機介面之目標偵測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-8-1以後開放)

摘要(中)

本研究使用可穿戴式慣性感測器（IMU）獲取關節旋轉資訊，將IMU數據輸入至Unity軟體中加以計算得以獲得人體骨骼姿態，設計運動捕捉系統獲取人體運動時間序列數據。本系統使用運動重定向（Motion Retargeting）的技術，透過人體骨骼姿態控制NaoV6，於頭部配戴VR頭戴顯示器以獲取機器人視野，使操作者有沉浸式的體驗，設計語音系統使人機兩端能進行語音通訊。NaoV6身上配置數個關節傳感器，透過傳感器之反饋信息與動態模型（Linear Inverse Pendulum）來實現平穩走路，考量機器人的安全性，足部控制設計條件透過閥值觸發前進、側移及轉身等動作，手勢姿態能夠滿足許多多元生活需求，因此手部控制必須更加精密，本研究使用逆向運動學與強化學習兩種方法將人體的手部姿態運動重定向至機器人手部控制，並比較兩種運動重定向方式對於系統的優劣。逆向運動學法替機器人手臂建置Denavit-Hartenberg參數模型（D-H模型），並將當前人體姿態之笛卡爾座標位置映射至機器人維度之座標位置，依據逆向運動學解，將當前手臂位置回推至機器人的關節角度，驅使機器人執行相應的動作。強化學習法，採用行動者評論家網路之學習方式，於機器人端預先設計好動作，分別為歡呼、揮手、指向、雙手合十、敬禮及擦臉等目標動作，透過獎懲機制自主式學習，使受試者的姿態資料自動生成機器人的姿態動作。經實驗驗證，本系統能即時識別操作者動作，而機器人在表現動作上也能順暢的被操控與正確表現動作，提出的模型具備泛化的能力，能夠執行部分未學習的動作，並依據平均弗雷歇距離分析，本系統平均軌跡誤差約1.9公分，在重定向控制上具有很高的穩定度。

摘要(英)

This study aims to use the inertial measurement unit sensor (IMU) data to reconstruct the human skeleton animation posture in Unity. A self-designed motion capture system is used to record the time series trajectory data of human animation. In this system, human controls Nao-V6 remotely by human posture with motion retargeting method, and gets the robot vision by VR headset, making user have an immersive experience. Design an audio system to communicate with the user of operator side and robot side.
Equipped with the smooth default foot movement in Nao V6 due to the feedback information of the sensors that mounted on the robot and Linear Inverse Pendulum model. Considering the safety of the robot, the foot control such as move forward, move sideway and turn action will be triggered by threshold. People always use different gestures to meet with various requirement of daily life, that is the reason that gesture control must be more sophisticated. Two different motion retargeting methods are indicated and compared in this research, inverse kinematics and reinforcement learning.
The inverse kinematics method needs to build Denavit-Hartenberg parameter model for each robot’s arm, and map the Cartesian coordinate of the current human posture to the robot dimension. The joint angles of the robot will be back-calculated through the current human arm position by the inverse kinematics solution. The reinforcement learning adopts Actor-Critic network. For the model learning, the robot should make six pre-designed motions. In the training phase, the human gesture will generate the gesture of the robot, the model parameter will update by reward and punishment rules. The proposed system has been demonstrated to successfully recognize subjects’ different in the initial onset of each motion action. According to the analysis of the average Fréchet distance, the average trajectory error of the system is 1.9 cm, and it has a high stability in the motion retargeting control.

關鍵字(中)

★ 慣性感測器
★ 逆向運動學
★ 強化學習
★ 運動重定向

關鍵字(英)

★ IMU
★ Inverse kinematics
★ Reinforcement learning
★ Motion retargeting

論文目次

中文摘要 i
Abstract vii
目錄 viii
圖目錄 x
表目錄 xi
第一章緒論 1
1-1 研究動機與目的 1
1-2 文獻探討 2
1-2-1 人體姿態研究 2
1-2-2 運動重定向 2
1-2-3 機器人運動重定向 2
1-2-4 強化學習 3
1-3 論文章節結構 4
第二章原理介紹 5
2-1 慣性感測單元 5
2-2 四元數與歐拉角 6
2-2-1 四元數（Quaternion） 6
2-2-2 歐拉角與旋轉矩陣 9
2-3 Nao機器人 10
2-4 機器人運動學 11
2-5 變分自動編碼器（Variational Autoencoder） 13
2-6 強化學習 15
2-6-1 強化學習簡介 15
2-6-2 行動者評論家算法（Actor-Critic） 17
2-6-3 近端策略優化算法（Proximal Policy Optimization ,PPO） 18
2-7 弗雷歇距離（Fréchet distance） 21
第三章研究設計與方法 22
3-1 系統架構 22
3-1-1 操作者遠端控制系統 22
3-1-2 人體運動捕捉系統 23
3-1-3 機器人系統 24
3-2 系統資料處理 26
3-2-1 IMU to Segment（I2S）校正 26
3-2-2 資料前處理 28
3-2-3 資料傳輸 29
3-3 系統設計 31
3-3-1 手部控制（強化學習） 31
3-3-2 手部控制（逆向運動學） 43
3-3-3 音訊控制 48
3-3-4 頭部控制 50
3-3-5 足部控制 52
第四章結果與討論 56
4-1 座標軸轉換 56
4-2 變分自動編碼器 57
4-3 行動者評論家網路 58
4-4 重定向效能 59
4-5 系統整合應用 69
第五章結論與未來展望 72
5-1 結論 72
5-2 未來展望 72
第六章參考文獻 73

參考文獻

[1] E.N. Corlett, S.J. MADELEY, and I. Manenica, "Posture targeting: a technique for recording working postures," vol. 22, no. 3, pp. 357-366, 1979.

[2] Victor Z. Priel, "A numerical definition of posture," vol. 16, no. 6, pp. 576-584, 1974.

[3] Wiktorin C, Mortimer M, Ekenvall L, Kilbom A,and Wigaeus Hjelm E, "HARBO, a simple computer-aided observation method for recording work postures," Scandinavian Journal of Work, Environment & Health, vol. 21, no. 6, pp. 440-449, 1995.

[4] Casale, P., Pujol, O., Radeva, P, "Human Activity Recognition from Accelerometer Data Using a Wearable Device," In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669, pp.289-296, 2011.

[5] J. Bandera, J. Rodriguez, L. Molina-Tanco, and A. Bandera, “A survey of vision-based architectures for robot learning by imitation,” International Journal of Humanoid Robotics, vol. 9, no. 01, p. 1250006, 2012.

[6] J.-S. Monzani, P. Baerlocher, R. Boulic, and D. Thalmann, “Using an intermediate skeleton and inverse kinematics for motion retargeting,” in Computer Graphics Forum, vol. 19, pp. 11–19, Wiley Online Library, 2000.

[7] M.-K. Hsieh, B.-Y. Chen, and M. Ouhyoung, “Motion retargeting and transition in different articulated figures,” in Ninth International Conference on Computer Aided Design and Computer Graphics (CADCG’05), pp. 6–pp, IEEE, 2005.

[8] S. Wang, X. Zuo, R. Wang, F. Cheng, and R. Yang, “A generative human-robot motion retargeting approach using a single depth sensor,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5369–5376, IEEE, 2017.

[9] P. Shahverdi and M. T. Masouleh, "A simple and fast geometric kinematic solution for imitation of human arms by a NAO humanoid robot,"2016 4th International Conference on Robotics and Mechatronics (ICROM), pp. 572-577,2016.

[10] E. Rolley-Parnell et al., "Bi-Manual Articulated Robot Teleoperation using an External RGB-D Range Sensor," 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 298-304, 2018.

[11] Sungjoon Choi, Matt Pan, and Joohyung Kim, "Nonparametric Motion Retargeting for Humanoid Robots on Shared Latent Space, "In: Proceedings of Robotics: Science and Systems (RSS) ,2020.

[12] Yuwei Liang et al. "Dynamic movement primitive based motion retargeting for dual-arm sign language motions". In: 2021 IEEE International Conference on Robotics and Automation
(ICRA). IEEE, pp. 8195–8201, 2021.

[13] Taewoo Kim and Joo-Haeng Lee, "C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning, "In: 2020 IEEE International
Conference on Robotics and Automation (ICRA), IEEE, pp. 8425–8432, 2020.

[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning, "arXiv preprint arXiv:1312.5602, 2013.

[15] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., "Mastering the game of go with deep neural networks and tree search, "nature, vol. 529, no. 7587, p. 484, 2016.

[16] Y. Liu, A. Gupta, P. Abbeel, and S. Levine, "Imitation from observation: Learning to imitate behaviors from raw video via context translation, " in 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125, IEEE, 2018.

[17] X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills, "ACM Transactions on Graphics (TOG), vol. 37, no. 4, p. 143, 2018.

[18] Oliver Kroemer, Scott Niekum, and George Konidaris. "A review of robot learning for manipulation: Challenges, representations, and algorithms," arXiv preprint arXiv:1907.03146, 2019.

[19] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning. CoRR, " abs/1509.02971, 2015.
[20] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. " Asynchronous methods for deep reinforcement learning," In International Conference on Machine Learning, pp. 1928–1937, 2016.

[21] Justin Fu, Sergey Levine, and Pieter Abbeel. "One-shot learning of manipulation skills with online dynamics adaptation and neural network priors, " In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4019–4026. IEEE, 2016.

[22] Ivaylo Popov, Nicolas Heess, Timothy P. Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, and Martin A. Riedmiller. "Data-efficient deep reinforcement learning for dexterous manipulation, " CoRR, abs/1704.03073, 2017.

[23] Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. "Overcoming exploration in reinforcement learning with demonstrations," In International Conference on Robotics and Automation, pp. 6292–6299, 2018.

[24] Henry Zhu, Abhishek Gupta, Aravind Rajeswaran, Sergey Levine, and Vikash Kumar. "Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost," arXiv preprint arXiv:1810.06045, 2018.

[25] Emanuel Todorov, Tom Erez, and Yuval Tassa. "Mujoco: A physics engine for model-based control, " In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012.

[26] Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenenbaum, and Antonio Torralba. "Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids," arXiv preprint arXiv:1810.01566, 2018.

[27] Hislop, Jaime, Mats Isaksson, John McCormick, and Chris Hensman. "Validation of 3-Space Wireless Inertial Measurement Units Using an Industrial Robot," Sensors 21, no. 20: 6858, 2021.

[28] Konda, V. Actor-Critic Algorithms. PhD thesis, Cambridge: Middlesex, Massachusetts, USA, 2002.

[29] L. P. Poubel, S. Sakka, D. Cehajic, and D. Creusot. "Support changes during online human motion imitation by a humanoid robot using task specification, " in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, pp. 1782–1787, 2014.

[30] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms, " arXiv preprint arXiv:1707.06347, 2017.

[31] Wien, Technische Universität ,Eiter, Thomas ,Eiter, Thomas ,Mannila, Heikki ,Mannila, Heikki , "Computing discrete Fréchet distance, " 1994.

[32] Tao Yu, Jianhui Zhao., Zerong Zheng, Kaiwen Guo, Qionghai Dai, HaoLi, Gerard Pons-Moll, Yebin Liu, "DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 10, pp. 2523-2539, 2020.

[33] C. Breazeal, C.D. Kidd, A.L. Thomaz, G. Hoffman, M. Berlin, Effects of nonverbal communication on efficiency and robustness in human-robot teamwork, in: Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, Alberta, Canada, pp. 708–713, 2005.

[34] Aaron P. Shon, Keith Grochow and Rajesh. P. N. Rao, "Robotic imitation from human motion capture using Gaussian processes," 5th IEEE-RAS International Conference on Humanoid Robots, pp. 129-134, 2005.

[35] Ibrahim, Adrianto & Adiprawita, Widyawardana, "Analytical Upper Body Human Motion Transfer to Naohumanoid Robot," International Journal on Electrical Engineering and Informatics. pp.563-574, 2012.

[36] Lee, Alex X., et al., "Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model,"Advances in Neural Information Processing Systems 33, pp.741-752, 2020.

指導教授

李柏磊(Po-Lei Lee)

審核日期

2022-8-25

推文