Learning Spatial Search and Map Exploration using Adaptive Submodular Inverse Reinforcement Learning

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：3.135.217.228

姓名

吳季潔(Ji-Jie Wu) 查詢紙本館藏

畢業系所

數學系

論文名稱

(Learning Spatial Search and Map Exploration using Adaptive Submodular Inverse Reinforcement Learning)

相關論文

★ 3D Map Exploration and Search using Topological Fourier Sparse Set	★ 次模深度壓縮感知用於空間搜尋
★ Evasion and Pursuit Games via Adaptive Submodularity	★ Maximal Coverage Problems with Routing Constraints using Cross-Entropy Monte Carlo Tree Search
★ Localizing Complex Terrain for Quadruped Robots using Adaptive Submodularity	★ 使用成本效益生成樹的資訊軌跡規劃
★ Map Explorations via Dynamic Tree-Structured Graph

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

找到空間搜尋和地圖探索問題的最佳路徑是NP-hard。由於空
間搜尋和環境探索是人類日常活動之一，因此從資料中學習人
類行為是解決這些問題的其中一種方法。利用兩個問題的自適
應次模性，本研究提出了一種自適應次模逆強化學習（ASIRL）
演算法來學習人類行為。ASIRL方法是在傅立葉域中學習獎勵函
數，並在空間域上對其進行重建，近似最佳路徑可以透過學習
獎勵函數算出。實驗顯示ASIRL演算法的表現優於現有方法（例
如REWARDAGG和QVALAGG）。

摘要(英)

Finding optimal paths for spatial search and map exploration problems are NP-hard. Since spatial search and environmental exploration are parts of human central activities, learning human behavior from data is a way to solve these problems. Utilizing the adaptive submodularity of two problems, this research proposes an adaptive submodular inverse reinforcement learning (ASIRL) algorithm to learn human behavior.
The ASIRL approach is to learn the reward functions in the Fourier domain and then recover it in the spatial domain. The nearoptimal path can be computed through learned reward functions. The experiments demonstrate that the ASIRL outperforms state of the art approaches (e.g., REWARDAGG and QVALAGG).

關鍵字(中)

★ 空間搜尋
★ 地圖探索
★ 自適應次模
★ 逆強化學習
★ 壓縮感測

關鍵字(英)

★ Spatial search
★ Map exploration
★ Adaptive submodularity
★ Inverse reinforcement learning
★ Compressed sensing

論文目次

摘要 i
Abstract ii
Acknowledgements iii
Contents iv
Figures vi
Tables xi
1 Introduction 1
1.1 Introduction 1
1.2 Publication Note 3
2 Related Works 4
2.1 Informative Path Planning (IPP) 4
2.2 Human Search and Control 4
2.3 Inverse Reinforcement Learning and Imitation Learning 5
2.4 Reinforcement Learning and Deep Reinforcement Learning 6
2.5 Submodularity 6
2.6 Adaptive Submodularity and Search via Submodularity 7
3 Background 9
3.1 Submodularity 9
3.2 Adaptive Submodularity 10
3.3 Spatial Fourier Sparse Set (SFSS) Leaning 13
3.4 Submodular Functions for Spatial Search and Map
Exploration Problems 16
4 Problem Reformulation of Search and Map Exploration
Problems 18
4.1 POMDP 18
4.2 ASIRL 19
4.3 Theoretical Guarantees . 20
5 Proposed Algorithms 25
5.1 Proposed Algorithms 25
6 Experiments 29
6.1 EX1: 2D Map Exploration Experiments 30
6.1.1 Experimental Setup 30
6.1.2 Experimental Results 31
6.2 EX2: 3D Spatial Search Experiments 33
6.2.1 Experimental Setup 33
6.2.2 Experimental Results 35
7 Conclusions 46
References 48
A Appendix 55
A.1 Search System Setup and Environments 55
A.2 Data_ow 55
A.3 Search Task and Human Subjects 57
A.4 Experimental Results 59

參考文獻

[1] Kuo-Shih Tseng and Bérénice Mettler. Near-optimal probabilistic search via submodularity and sparse regression. Autonomous Robots, 2015.
[2] Kuo-Shih Tseng and Bérénice Mettler. Near-optimal probabilistic search using spatial fourier sparse set. Autonomous Robots, 2017.
[3] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:265-294, 1978.
[4] Haifeng Zhang and Yevgeniy Vorobeychik. Submodular optimization with routing constraints. AAAI Conference on Artificial Intelligence, 16:819-826, 2016.
[5] Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. AAAI Conference on Artificial Intelligence, 2008.
[6] Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888, 2015.
[7] Samir Khuller, Anna Moss, and Joseph Seffi Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39-45, 1999.
[8] Sanjiban Choudhury, Mohak Bhardwaj, Sankalp Arora, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, and Debadeepta Dey. Data-driven planning via imitation learning. The International Journal of Robotics Research, 37(13-14):1632-1672, 2018.
[9] Maria-Florina Balcan and Nicholas JA Harvey. Learning submodular functions. Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 793-802, 2011.
[10] Bing-Xian Lu, Ji-Jie Wu, Yu-Chung Tsai, Wan-Ting Jiang, and Kuo-Shih Tseng. A novel telerobotic search system using an unmanned aerial vehicle. IEEE International Conference on Robotic Computing, 2020.
[11] Ji-Jie Wu and Kuo-Shih Tseng. Learning spatial search using submodular inverse reinforcement learning. IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), 2020.
[12] Ji-Jie Wu and Kuo-Shih Tseng. Adaptive submodular inverse reinforcement learning for spatial search and map exploration. Autonomous Robots (AURO), under review.
[13] Jonathan Binney, Andreas Krause, and Gaurav S Sukhatme. Informative path planning for an autonomous underwater vehicle. IEEE International Conference on Robotics and Automation, pages 4791-4796, 2010.
[14] Amarjeet Singh, Andreas Krause, Carlos Guestrin, William J Kaiser, and Maxim A Batalin. Efficient planning of informative paths for multiple robots. Proceedings of the 20th International Joint Conference on Artificial Intelligence, 7:2204-2211, 2007.
[15] Jonathan Binney and Gaurav S Sukhatme. Branch and bound for informative path planning. IEEE International conference on Robotics and Automation, pages 2147-2154, 2012.
[16] Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5. Kobe, Japan, 2009.
[17] Kuo-Shih Tseng and Bérénice Mettler. Analysis of coordination patterns between gaze and control in human spatial search. 2nd IFAC Conference on Cyber-Physical and Human-Systems, 2018.
[18] Kuo-Shih Tseng and Bérénice Mettler. Analysis and augmentation of human performance on telerobotic search problems. IEEE Access, 8:56590-56606, 2020.
[19] Liting Sun, Wei Zhan, and Masayoshi Tomizuka. Probabilistic prediction of interactive driving behavior via hierarchical inverse reinforcement learning. IEEE 21st International Conference on Intelligent Transportation Systems (ITSC), page 2111-2117, 2018.
[20] Katie Luo Justin Fu and Sergey Levine. Learning robust rewards with adversarial inverse reinforcement learning. International Conference on Learning Representations (ICLR), 2018.
[21] Brian D Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. PhD thesis, Carnegie Mellon University, 2010.
[22] N Sukumar. Construction of polygonal interpolants: a maximum entropy approach. International journal for numerical methods in engineering, 61(12):2159_2181, 2004.
[23] Masamichi Shimosaka, Junichi Sato, Kazuhito Takenaka, and Kentarou Hitomi. Fast inverse reinforcement learning with interval consistent graph for driving behavior prediction. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[24] Henrik Kretzschmar, Markus Spies, Christoph Sprunk, and Wolfram Burgard. Socially compliant mobile robot navigation via inverse reinforcement learning. The International Journal of Robotics Research, 35(11):1289-1307, 2016.
[25] Julien Audiffren, Michal Valko, Alessandro Lazaric, and Mohammad Ghavamzadeh. Maximum entropy semi-supervised inverse reinforcement learning. Proceedings of the Twenty-Fourth International Joint Conference on Arti_cial Intelligence (IJCAI), 2015.
[26] Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. Activity forecasting. In European Conference on Computer Vision (ECCV), pages 201-214, 2012.
[27] MarkusWulfmeier, Dominic ZengWang, and Ingmar Posner. Watch this: Scalable cost-function learning for path planning in urban environments. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2089-2095, 2016.
[28] Matthew Alger. Deep inverse reinforcement learning. Tech. rep., 2016.
[29] Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. Deep inverse reinforcement learning. CoRR, abs/1507.04888, 2015.
[30] Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and
Chrisina Jayne. Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1-35, 2017.
[31] Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
[32] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. arXiv:1606.03476, 2016.
[33] Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, and Debadeepta Dey. Learning to gather information via imitation. IEEE International Conference on Robotics and Automation (ICRA), pages 908-915, 2017.
[34] Csaba Szepesvári. Algorithms for reinforcement learning. Morgan and Claypool, 2009.
[35] Ruohan Zhang, Shun Zhang, Matthew H Tong, Yuchen Cui, Constantin A Rothkopf, Dana H Ballard, and Mary M Hayhoe. Modeling sensory-motor decisions in natural behavior. PLoS computational biology, 14(10):e1006518, 2018.
[36] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, page 3431-3440, 2015.
[37] Fereshteh Sadeghi and Sergey Levine. Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016.
[38] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. AAAI Conference on Artificial Intelligence, 7:1650-1654, 2007.
[39] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 9:235-284, 2008.
[40] Yu-Chung Tsai and Kuo-Shih Tseng. Deep compressed sensing for learning submodular functions. Sensors, 20(9):2591, 2020.
[41] Peter Stobbe and Andreas Krause. Learning fourier sparse set functions. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, pages 1125-1133, 2012.
[42] Victor Gabillon, Branislav Kveton, ZhengWen, Brian Eriksson, and S Muthukrishnan. Large-scale optimistic adaptive submodularity. AAAI Conference on Artificial Intelligence, pages 1816-1823, 2014.
[43] Victor Gabillon, Branislav Kveton, ZhengWen, Brian Eriksson, and S Muthukrishnan. Adaptive submodular maximization in bandit setting. In Advances in Neural Information Processing Systems, 26:2697-2705, 2013.
[44] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. The Journal of Artificial Intelligence Research, 42(1):427-486, 2011.
[45] Geoffrey Hollinger and Sanjiv Singh. Proofs and experiments in scalable, near-optimal search by multiple robots. Proceedings of Robotics: Science and Systems IV, Zurich, Switzerland, 1, 2008.
[46] Yu-Chung Tsai, Bing-Xian Lu, and Kuo-Shih Tseng. Spatial search via adaptive submodularity and deep learning. IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 112-113, 2019.
[47] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634_652, 1998.
[48] R. G. Baraniuk. Compressive sensing. IEEE Signal Processing Magazine, 24(4):118-121, 2007.
[49] Saad Qaisar, Rana Muhammad Bilal, Wafa Iqbal, Muqaddas Naureen, and Sungyoung Lee. Compressive sensing: From theory to applications, a survey. Journal of Communications and Networks,15(5):443_456, 2013.
[50] Bing-Xian Lu and Kuo-Shih Tseng. 3d map exploration via learning submodular functions in the fourier domain. International Conference on Unmanned Aircraft Systems (ICUAS), 2020.
[51] Mark Schmidt. Least squares optimization with l1-norm regularization. CS542B Project Report, 504:195-221, 2005.
[52] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267-288, 1996.
[53] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183-202, 2009.
[54] Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489-509, 2006.
[55] Kuo-Shih Tseng. Learning in human and robot search: Subgoal, submodularity, and sparsity. University of Minnesota Ph.D. dissertation, 2016.
[56] Stefan Isler, Reza Sabzevari, Jeffrey Delmerico, and Davide Scaramuzza. An information gain formulation for active volumetric 3d reconstruction. IEEE International Conference on Robotics and Automation (ICRA), pages 3477-3484, 2016.

指導教授

曾國師(Kuo-Shih Tseng)

審核日期

2021-1-26

推文