姓名 吳季潔(Ji-Jie Wu) 畢業系所 數學系
(Learning Spatial Search and Map Exploration using Adaptive Submodular Inverse Reinforcement Learning)
摘要(中) 找到空間搜尋和地圖探索問題的最佳路徑是NP-hard。由於空
間搜尋和環境探索是人類日常活動之一, 因此從資料中學習人
應次模性, 本研究提出了一種自適應次模逆強化學習(ASIRL)
數, 並在空間域上對其進行重建,近似最佳路徑可以透過學習
摘要(英) Finding optimal paths for spatial search and map exploration problems are NP-hard. Since spatial search and environmental exploration are parts of human central activities, learning human behavior from data is a way to solve these problems. Utilizing the adaptive submodularity of two problems, this research proposes an adaptive submodular inverse reinforcement learning (ASIRL) algorithm to learn human behavior.
The ASIRL approach is to learn the reward functions in the Fourier domain and then recover it in the spatial domain. The nearoptimal path can be computed through learned reward functions. The experiments demonstrate that the ASIRL outperforms state of the art approaches (e.g., REWARDAGG and QVALAGG).
關鍵字(中) ★ 空間搜尋
★ 地圖探索
★ 自適應次模
★ 逆強化學習
★ 壓縮感測
關鍵字(英) ★ Spatial search
★ Map exploration
★ Adaptive submodularity
★ Inverse reinforcement learning
★ Compressed sensing
論文目次 摘要 i
Abstract ii
Acknowledgements iii
Contents iv
Figures vi
Tables xi
1 Introduction 1
1.1 Introduction 1
1.2 Publication Note 3
2 Related Works 4
2.1 Informative Path Planning (IPP) 4
2.2 Human Search and Control 4
2.3 Inverse Reinforcement Learning and Imitation Learning 5
2.4 Reinforcement Learning and Deep Reinforcement Learning 6
2.5 Submodularity 6
2.6 Adaptive Submodularity and Search via Submodularity 7
3 Background 9
3.1 Submodularity 9
3.2 Adaptive Submodularity 10
3.3 Spatial Fourier Sparse Set (SFSS) Leaning 13
3.4 Submodular Functions for Spatial Search and Map
Exploration Problems 16
4 Problem Reformulation of Search and Map Exploration
Problems 18
4.1 POMDP 18
4.2 ASIRL 19
4.3 Theoretical Guarantees . 20
5 Proposed Algorithms 25
5.1 Proposed Algorithms 25
6 Experiments 29
6.1 EX1: 2D Map Exploration Experiments 30
6.1.1 Experimental Setup 30
6.1.2 Experimental Results 31
6.2 EX2: 3D Spatial Search Experiments 33
6.2.1 Experimental Setup 33
6.2.2 Experimental Results 35
7 Conclusions 46
References 48
A Appendix 55
A.1 Search System Setup and Environments 55
A.2 Data_ow 55
A.3 Search Task and Human Subjects 57
A.4 Experimental Results 59
指導教授 曾國師(Kuo-Shih Tseng) 審核日期 2021-1-26
