深度強化式學習硬體架構設計應用於多輸入多輸出混合式波束合成系統之預編碼器

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.137.41.3

姓名

吳禹辰(Yu-Chen Wu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

深度強化式學習硬體架構設計應用於多輸入多輸出混合式波束合成系統之預編碼器
(Hardware Architecture Design for Deep Reinforcement Learning Applied to Precoders in MIMO Hybrid Beamforming Systems)

相關論文

★ 具輸出級誤差消除機制之三位階三角積分D類放大器設計	★ 應用於無線感測網路之多模式低複雜度收發機設計
★ 用於數位D類放大器的高效能三角積分調變器設計	★ 交換電容式三角積分D類放大器電路設計
★ 適用於平行處理及排程技術的無衝突定址法演算法之快速傅立葉轉換處理器設計	★ 適用於IEEE 802.11n之4×4多輸入多輸出偵測器設計
★ 應用於無線通訊系統之同質性可組態記憶體式快速傅立葉處理器	★ 3GPP LTE正交分頻多工存取下行傳輸之接收端細胞搜尋與同步的設計與實現
★ 應用於3GPP-LTE下行多天線接收系統高速行駛下之通道追蹤與等化	★ 適用於正交分頻多工系統多輸入多輸出訊號偵測之高吞吐量QR分解設計
★ 應用於室內極高速傳輸無線傳輸系統之設計與評估	★ 適用於3GPP LTE-A之渦輪解碼器硬體設計與實作
★ 下世代數位家庭之千兆級無線通訊系統	★ 協作式通訊於超寬頻通訊系統之設計
★ 適用於3GPP-LTE系統高行車速率基頻接收機之設計	★ 多使用者多輸入輸出前編碼演算法及關鍵組件設計

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-1-31以後開放)

摘要(中)

本篇論文將設計一個深度強化式學習之硬體架構，將其應用在混合式波束合成的系統架構上，對預編碼器進行神經網路學習。在已知通道狀態資訊的前提下，使用深度確定性梯度下降算法(Deep Deterministic Policy Gradient)來取得類比預編碼器，以類比預編碼器作為狀態，代理人採取動作後，從環境得到的反饋及獎勵，以獎勵的收斂趨勢來判斷訓練是否成功。以上述的算法，我們分析演算法中各項參數對訓練結果的影響，而以通道容量作為我們評斷其效能的標準，。選定各項神經網路之參數後，設計一硬體架構來實現深度確定性策略梯度下降於混合式預編碼系統(DDPG on Hybrid Precoding Algorithm)，其架構由價值函數估計網路運算單元、價值函數目標網路運算單元、動作策略估計網路運算單元、動作策略目標網路運算單元、區域記憶體、開根號器與除法器，以及控制訊號所組成，每個價值函數估計網路與目標網路運算單元都包含16片價值函數網路運算片，而每個動作策略估計網路與目標網路運算單元都包含48片價值函數網路運算片，其運算片中都包含最小的運算處理單元，包含乘法器與加法器各4個。由相同的處理單原來設計DDPG，以正向傳遞來累積訓練神經網路所需要的參數，而以反向傳遞來更新網路參數，完成一次正向與反向傳遞各需要153與322個時脈。我們將此硬體設計於Xlinx VCU128上，操作頻率達95.2MHz，報告顯示使用了536514個LUT、256個BRAM、1024個DSP，在批量大小為1的時候，我們的每秒能產生之結果(IPS)為18332，而在批量大小為32的時候，我們的IPS為211556。而為了確認神經網路輸出與網路參數更新是否正確，與軟體上的位元仿真模型進行比較，其結果無誤差，以此表示此硬體可以對深度強化式學習進行訓練與更新。

摘要(英)

A hardware architecture for deep reinforcement learning applied to the hybrid beamforming system is proposed. Given known channel state information, the algorithm employs deep deterministic gradient descent algorithm to calculate the phase of analog precoders. The previous analog precoder phases are used as states, and the agent generates actions based on feedback and rewards defined as channel capacity from the environment. The convergence trend of the channel capacity and the comparison to the conventional algorithms demonstrate the success of training.. A hardware architecture is then designed to implement DDPG with proper parameters. The architecture includes units for critic evaluation network units, critic target network, actor evaluation network, actor target network. Each critic evaluation network and target network contains 16 computation slices, and each actor evaluation and target network comprises 48 computation slices. Each computation slice includes basic processing elements, each having four multipliers and accumulators. Both forward and backward passes are executed in the proposed hardware architecture. For each forward and backward pass, 153 and 322 clock cycles are needed. The hardware design is synthesized on Xilinx VCU128, which achieves operating frequency of 95.2MHz, utilizing 536514 LUTs, 256 BRAMs, and 1024 DSPs. To confirm the correctness of neural network outputs and parameter updates, a comparison with a bit-true model shows no discrepancies, indicating that the hardware is capable of both training and inference for deep reinforcement learning. It can accomplish 211K and 18K training and inferences per second if the batch size is equal to 1 and 32, respectively.

關鍵字(中)

★ 深度強化式學習
★ 深度強化式學習應用於混合式預編碼系統
★ 混合式波束合成系統
★ 預編碼器
★ 硬體實作

關鍵字(英)

★ Deep Reinforcement Learning
★ Deep Deterministic Policy Gradient
★ Hybrid Beamforming System
★ Precoder
★ Implementation

論文目次

摘要 I
ABSTRACT II
目錄 III
表目錄 VI
圖目錄 IX
第一章緒論 1
1.1 簡介 1
1.2 研究動機 1
1.3 論文組織 2
第二章深度強化式學習(Deep Reinforcement Learning) 3
2.1 神經網路(Neural Network) 3
2.2 Q學習(Q Learning) 4
2.3 深度Q網路(Deep Q Network, DQN) 5
2.4 經驗回放(Experience Replay) 9
2.5 深度確定性策略梯度下降算法(Deep Deterministic Policy Gradient, DDPG) 10
第三章應用於混合式預編碼系統之演算法 21
3.1 混合式波束合成系統(Hybrid Beamforming System) 21
3.2 傳統演算法 24
3.2.1 全數位預編碼器和結合器系統(Fully Digital Precoder and Combiner System) [5] 24
3.2.2 單一使用者混合式預編碼系統(Single-user Hybrid Precoding System) [5] 25
第四章深度強化式學習應用於混合式預編碼系統 28
4.1 深度確定性策略梯度下降於混合式預編碼系統 28
4.2 深度確定性策略梯度下降於混合式預編碼演算法 32
4.3 不同參數設定之模擬結果分析 34
第五章複雜度分析與硬體架構實現 47
5.1 複雜度分析 47
5.1.1 正向傳遞(Forward Propagation)之複雜度 48
5.1.2 反向傳遞(Backward Propagation)之複雜度 50
5.1.3 運算複雜度之分析比較 57
5.1.4 處理單元分配 60
5.2 硬體設計與實現 61
5.2.1 處理單元(Process Element, PE)設計 67
5.2.2 單精確度浮點數加法器設計 69
5.2.3 單精確度浮點數乘法器設計 71
5.2.4 單精確度浮點數除法器設計 72
5.3 資料流控制與排程(Preprocessing) 75
5.3.1 動作策略網路之正向傳遞資料流排程 75
5.3.2 價值函數網路之正向傳遞資料流排程 81
5.3.3 價值函數網路之反向傳遞資料流排程 88
5.3.4 動作策略網路之反向傳遞資料流排程 100
第六章硬體實作結果與比較 116
6.1 不同精確度浮點數之模型模擬結果 116
6.2 FPGA合成與量測結果 118
6.3 綜合比較 123
第七章結論 125
參考資料 127

參考文獻

[1] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction”, The MIT Press, Cambridge, Massachusetts, London, England, 5 Nov. 2017.
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” in NIPS Deep Learning Workshop, 2013.
[3] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra and M. Riedmiller, “Deterministic Policy Gradient Algorithms,” in Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 387-395.
[4] E. Todorov, T. Erez and Y. Tassa, "MuJoCo: A physics engine for model-based control," 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012, pp. 5026-5033, doi: 10.1109/IROS.2012.6386109.
[5] 黃孝生, “多使用者多輸入多輸出系統下之混合式波束合成演算法與架構設計,” 碩士論文, 國立中央大學電機工程學系, 2018.
[6] A. Alkhateeb, R. W. Heath and G. Leus, "Achievable rates of multi-user millimeter wave systems with hybrid precoding," 2015 IEEE International Conference on Communication Workshop (ICCW), London, UK, 2015, pp. 1232-1237, doi: 10.1109/ICCW.2015.7247346.
[7] F. Sohrabi and W. Yu, "Hybrid Digital and Analog Beamforming Design for Large-Scale Antenna Arrays," in IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 501-513, April 2016, doi: 10.1109/JSTSP.2016.2520912.
[8] T. Lin and Y. Zhu, "Beamforming Design for Large-Scale Antenna Arrays Using Deep Learning," in IEEE Wireless Communications Letters, vol. 9, no. 1, pp. 103-107, Jan. 2020, doi: 10.1109/LWC.2019.2943466.
[9] A. M. Elbir, "CNN-Based Precoder and Combiner Design in mmWave MIMO Systems," in IEEE Communications Letters, vol. 23, no. 7, pp. 1240-1243, July 2019, doi: 10.1109/LCOMM.2019.2915977.
[10] Richard Bellman, “A Markovian Decision Process”, Journal of Mathematics and Mechanics , 1957, Vol. 6, No. 5 (1957), pp. 679-684.
[11] 黃柏銓, “基於深度強化學習之預編碼器設計於多輸入多輸出混合式波束合成系統進行波束追蹤,” 碩士論文, 國立中央大學電機工程學系, 2022.
[12] P. -Y. Tsai, Y. Chang and J. -L. Li, "Fast-Convergence Singular Value Decomposition for Tracking Time-Varying Channels in Massive Mimo Systems," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 1085-1089, doi: 10.1109/ICASSP.2018.8462248.
[13] 蔣詠軒, “深度強化式學習之客製化區塊浮點數硬體架構設計與優化,” 碩士論文, 國立中央大學電機工程學系, 2022.
[14] C. -W. Hu, J. Hu and S. P. Khatri, "TD3lite: FPGA Acceleration of Reinforcement Learning with Structural and Representation Optimizations," 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), Belfast, United Kingdom, 2022, pp. 79-85, doi: 10.1109/FPL57034.2022.00023.
[15] Y. Meng, S. Kuppannagari and V. Prasanna, "Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms," 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Fayetteville, AR, USA, 2020, pp. 19-27, doi: 10.1109/FCCM48280.2020.00012.

指導教授

蔡佩芸(Pei-Yun Tsai)

審核日期

2024-1-30

推文