深度逆強化學習於資訊軌跡規劃;Deep Inverse Reinforcement Learning for Informative Path Planning

NCU Institutional Repository > 理學院 > 數學系 > 研究計畫 > Item 987654321/82393

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/82393

題名:	深度逆強化學習於資訊軌跡規劃;Deep Inverse Reinforcement Learning for Informative Path Planning
作者:	曾國師
貢獻者:	國立中央大學數學系
關鍵詞:	資訊軌跡規劃;深度逆強化學習;遷移學習;次模性;壓縮感測;Informative path planning;deep inverse reinforcement learning;transfer learning;submodularity;compressed sensing
日期:	2020-01-13
上傳時間:	2020-01-13 14:50:09 (UTC+8)
出版者:	科技部
摘要:	資訊軌跡規劃近年在人工智慧領域引起注意，此技術不同於傳統的機器人軌跡規劃在於—目標為最大化資訊收集而非避開障礙物朝向目標點。根據資訊量的定義不同有相對應的應用，例如: 例如搜尋感染農作物、檢查建築結構裂痕、山難搜救、搜尋盜獵濫伐、監控城市汙染分布、自動建立環境三維地圖等等。然而這些問題已被證明為NP-hard，所以僅能求出近似解。為了能突破此領域的研究現狀，本研究提出一個以深度逆強化學習的方法，透過分析人類如何解決日常生活中的資訊軌跡規劃問題，來建立一個學習方法提升機器人執行資訊軌跡規劃的效能。本計畫將分為三年執行，第一年著重以深度逆強化學習來探索人類處理資訊軌跡規劃問題的獎勵函數。第二年著重以分析人類處理不同資訊軌跡規劃時的遷移學習能力。第三年著重在人機協同處理資訊軌跡規劃問題。此計畫之研究目標將針對人類在資訊軌跡規劃問題之三項子議題探討: (1) 資訊軌跡規劃問題可學習? 若可，需要多少資料量?(2) 人類在處理不同環境的資訊軌跡規劃問題時如何遷移學習?(3) 人類與機器人在資訊軌跡規劃的差異性與互補性為何? ;The AI community has been paying more attention to the concept of informative path planning (IPP). The difference between path planning and IPP is that IPP is to maximize information gathering instead of avoiding obstacles. There are different applications depending on the definition of information (e.g., detection of infected plants, search for structural failure, mountain rescue and search, illegal logging, monitor of pollutions and 3D mapping). However, finding optimal solutions for these problems are NP-hard, so finding approximate solutions is a feasible way. To make a breakthrough of the IPP research status, this research proposed a deep inverse reinforcement learning approach to improve IPP performance of robots through analyzing how humans solve IPP problems in daily lives. The project will take three years. The focus of the first year is to explore the reward functions of that humans solve IPP problems via deep inverse reinforcement learning. The focus of the second year is to analyze the transfer learning of that humans solve different IPP problems. The focus of the third year is to explore human-robot cooperative IPP problems. The goal of this research is to explore three issues of IPP:(1) IPP is learnable? If it is learnable, how much data robots need? (2) How do humans transfer their knowledge for different IPP problems? (3) What’s the difference and respective strengths of humans and robots?
關聯:	財團法人國家實驗研究院科技政策研究與資訊中心
顯示於類別:	[數學系] 研究計畫

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	204	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....