毫米波無細胞大規模多輸入多輸出系統中使用深度強化學習技術應用於用戶選擇及功率分配;User Selection and Power Allocation by Using Deep Reinforcement Learning in Millimeter Wave Cell Free Massive MIMO Systems

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/89694

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/89694

題名:	毫米波無細胞大規模多輸入多輸出系統中使用深度強化學習技術應用於用戶選擇及功率分配;User Selection and Power Allocation by Using Deep Reinforcement Learning in Millimeter Wave Cell Free Massive MIMO Systems
作者:	吳玲萱;Wu, Lin-Hsuan
貢獻者:	通訊工程學系
關鍵詞:	毫米波;無細胞大規模多輸入多輸出系統;用戶選擇;功率分配;深度強化學習技術;Millimeter-wave;Cell Free Massive MIMO Systems;User Selection;Power Allocation;Deep Reinforcement Learning
日期:	2022-08-29
上傳時間:	2022-10-04 11:52:57 (UTC+8)
出版者:	國立中央大學
摘要:	無細胞大規模MIMO系統是一項具有潛力的技術，被提出為5G和6G的關鍵技術之一。不同於傳統的蜂巢式結構，在無細胞大規模MIMO系統中具有一個中央控制器及大量的無線存取點(AP)在覆蓋範圍內，並且每個無線存取點都具備大量的服務天線，能夠同時為覆蓋範圍內的所有用戶進行聯合傳輸。一個關鍵挑戰在於當無線存取點受到流量限制時，要如何選擇服務用戶及功率分配，使所有使用者能夠獲得最佳的資料傳輸率。在本篇論文中，使用深度強化學習(Deep reinforcement learning)技術應用在毫米波無細胞大規模MIMO系統中的用戶選擇及功率分配，透過放入適當的環境資訊和設定回饋方法，並且經過有效的訓練，來達到最適合的多用戶選擇及無線存取點的功率分配。我們的環境資訊包括所有無線存取點對於所有用戶的路徑損耗和通道狀態資訊，獎勵的方法設定為所有用戶的最大頻譜效率，透過隨機分布的無線存取點和用戶來做為訓練的輸入，在訓練結束後，將測試環境放入訓練好的神經網路，就能獲得連續動作，相當於用戶選擇及功率分配。最後根據深度強化學習的結果來計算頻譜效率，能夠證明此方法是具有優勢的。;The cell-free massive MIMO system is a potential technology and has been proposed as one of the key technologies for 5G and 6G. Different from the traditional cellular structure, in a cell-free massive MIMO system there is a central controller and a number of wireless access points within the coverage area, and each access point has a large number of serving antennas. The system is capable of joint transmission for all user equipments within the coverage area at the same time. A key challenge is how to select service user equipments and allocate power so that all user equipments can obtain the better transmission data rate when the wireless access point is limited by traffic load. In this paper, deep reinforcement learning technique is applied to user selection and power allocation in millimeter wave cell-free massive MIMO systems. By putting in the appropriate channel state information and setting the reward method. After effective training, the optimal multi-user selection and power allocation of the access point can be achieved. Our environmental information includes the path loss and the channel state information for all access points for all user equipments. The reward method is set to the maximum spectral efficiency of all UEs. A random distribution of access points and user equipments is used as training data. After training, put the test environment into the trained neural network, and we can get continuous action, which is equivalent to user selection and power allocation. Finally, the spectral efficiency is calculated according to the results of deep reinforcement learning, which can prove the advantage of this method.
顯示於類別:	[通訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	147	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....