dc.description.abstract | In the era of the Internet of Things (IoT), a large number of low-power wireless communication nodes will be widely deployed. For nodes deployed in complex and dangerous areas, e.g., deserts, wilderness, disasters, and battlefields, the operation mainly relies on batteries as the power source, and solar energy has been regarded as an effective way to achieve permanent wireless communications. Due to the advantages of high mobility, easy deployment, and low cost, unmanned aerial vehicles (UAVs) can be flexibly used to collect data from widely distributed ground wireless nodes, thus improving the energy efficiency of wireless communications. However, the flight of UAVs is limited by the power constraints of their own batteries, and it is an essential issue to appropriately design the resource allocation of UAV communications.
In this paper, we consider multiple solar-powered wireless nodes which utilize the harvested solar energy to transmit collected data to multiple UAVs in the uplink. In this context, we jointly design the UAV flight trajectory, UAV-node communication association, and uplink power control strategy to effectively use the harvested energy and manage the co-channel interference under a finite time horizon. To ensure the fairness of wireless nodes, the design goal is to maximize the worst sum rate among nodes. The joint design problem is highly non-convex and requires the causal (future) knowledge of the instantaneous energy harvesting information (EHI) and channel state information (CSI), which are difficult to predict in reality. To overcome these design challenges, we first propose an offline method based on convex optimization that only utilizes the average EHI and CSI and solve the problem via three convex sub-problems by applying successive convex approximation (SCA) and alternating optimization to find the offline strategy for UAV trajectory, UAV-node communication association, and uplink power control. Using the offline strategy, we further design an online reinforcement learning (RL) method to improve the system performance based on real-time environmental information. An idea of regulated flight corridors of multiple UAVs, based on the offline optimized flight paths, is proposed to avoid unnecessary flight exploration of UAVs and enables us to improve not only the learning efficiency but also the system performance, as compared with the conventional RL method. | en_US |