使用多代理人強化學習於無線快取網路設計空中基地台三維路徑之研究;3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks Using Multi-Agent Reinforcement Learning

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Communication Engineering > Electronic Thesis & Dissertation > Item 987654321/86296

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/86296

Title:	使用多代理人強化學習於無線快取網路設計空中基地台三維路徑之研究;3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks Using Multi-Agent Reinforcement Learning
Authors:	廖鍇旻;Liao, Kai-Min
Contributors:	通訊工程學系
Keywords:	無人機;路徑設計;無線快取;多代理人強化學習;Unmanned aerial vehicles (UAVs);trajectory design;wireless caching;multi-agent reinforcement learning
Date:	2021-03-04
Issue Date:	2021-12-07 12:28:57 (UTC+8)
Publisher:	國立中央大學
Abstract:	在本論文中，我們考慮一個無線裝置間通訊（Device-to-device, D2D）網路，藉由在三維空間中設計具快取功能無人機的最佳路徑，以最大化長期網路吞吐量。由於能將熱門內容快取在鄰近移動用戶中，D2D 快取能夠有效提升網路吞吐量並減輕網路後傳負擔。此外，無人機因為具有高移動性以及可靈活布署等特徵，所以將其視為飛行基站的研究也漸漸受到關注。使用具快取功能的無人機可以追蹤用戶的移動模式，並藉由有限的快取儲存空間提供服務。然而，由於動態環境中具有頻繁變化的網路拓撲，在需同時考慮到空中與地面快取節點的情況下，設計出最佳的無人機路徑軌跡具有一定的挑戰性。針對此挑戰，我們提出了一種基於多代理人強化學習的新穎框架，該框架能在不需中央協調器的情況下以分布式學習設計出每台無人機的最佳三維路徑。在所提出之方法中，一定距離內的多台無人機可以透過共享經驗來共同決定飛行決策。模擬結果展示了我們的演算法優於傳統的單代理人以及多代理人Q學習演算法。本論文將具有快取功能的無人機作為地面 D2D 快取網路的重要輔助，並證實其可行性以及有效性。;This paper investigates a dynamic 3D trajectory design of multiple cache-enabled unmanned aerial vehicles (UAVs) in a wireless device-to-device (D2D) caching network with the goal of maximizing the long-term network throughput. By storing popular content at the nearby mobile user devices, D2D caching is an efficient method to improve network throughput and alleviate backhaul burden. With the attractive features of high mobility and flexible deployment, UAVs have recently attracted significant attention as cache-enabled flying base stations. The use of cache-enabled UAVs opens up the possibility of tracking the mobility pattern of the corresponding users and serving them under limited cache storage capacity. However, it is challenging to determine the optimal UAV trajectory due to the dynamic environment with frequently changing network topology and the coexistence of aerial and terrestrial caching nodes. In response, we propose a novel multi-agent reinforcement learning based framework to determine the optimal 3D trajectory of each UAV in a distributed manner without a central coordinator. In the proposed method, multiple UAVs can cooperatively make flight decisions by sharing the gained experiences within a certain proximity to each other. Simulation results reveal that our algorithm outperforms the traditional single- and multi-agent Q-learning algorithms. This work confirms the feasibility and effectiveness of cache-enabled UAVs which serve as an important complement to terrestrial D2D caching nodes.
Appears in Collections:	[Graduate Institute of Communication Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	96	View/Open

社群 sharing

Loading...