無細胞大規模MIMO系統是一項具有潛力的技術,被提出為5G和6G的關鍵技術之一。不同於傳統的蜂巢式結構,在無細胞大規模MIMO系統中具有一個中央控制器及大量的無線存取點(AP)在覆蓋範圍內,並且每個無線存取點都具備大量的服務天線,能夠同時為覆蓋範圍內的所有用戶進行聯合傳輸。一個關鍵挑戰在於當無線存取點受到流量限制時,要如何選擇服務用戶及功率分配,使所有使用者能夠獲得最佳的資料傳輸率。在本篇論文中,使用深度強化學習(Deep reinforcement learning)技術應用在毫米波無細胞大規模MIMO系統中的用戶選擇及功率分配,透過放入適當的環境資訊和設定回饋方法,並且經過有效的訓練,來達到最適合的多用戶選擇及無線存取點的功率分配。我們的環境資訊包括所有無線存取點對於所有用戶的路徑損耗和通道狀態資訊,獎勵的方法設定為所有用戶的最大頻譜效率,透過隨機分布的無線存取點和用戶來做為訓練的輸入,在訓練結束後,將測試環境放入訓練好的神經網路,就能獲得連續動作,相當於用戶選擇及功率分配。最後根據深度強化學習的結果來計算頻譜效率,能夠證明此方法是具有優勢的。;The cell-free massive MIMO system is a potential technology and has been proposed as one of the key technologies for 5G and 6G. Different from the traditional cellular structure, in a cell-free massive MIMO system there is a central controller and a number of wireless access points within the coverage area, and each access point has a large number of serving antennas. The system is capable of joint transmission for all user equipments within the coverage area at the same time. A key challenge is how to select service user equipments and allocate power so that all user equipments can obtain the better transmission data rate when the wireless access point is limited by traffic load. In this paper, deep reinforcement learning technique is applied to user selection and power allocation in millimeter wave cell-free massive MIMO systems. By putting in the appropriate channel state information and setting the reward method. After effective training, the optimal multi-user selection and power allocation of the access point can be achieved. Our environmental information includes the path loss and the channel state information for all access points for all user equipments. The reward method is set to the maximum spectral efficiency of all UEs. A random distribution of access points and user equipments is used as training data. After training, put the test environment into the trained neural network, and we can get continuous action, which is equivalent to user selection and power allocation. Finally, the spectral efficiency is calculated according to the results of deep reinforcement learning, which can prove the advantage of this method.