![]() |
|
以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:106 、訪客IP:3.133.111.183
姓名 吳禹辰(Yu-Chen Wu) 查詢紙本館藏 畢業系所 電機工程學系 論文名稱 深度強化式學習硬體架構設計應用於多輸入多輸出混合式波束合成系統之預編碼器
(Hardware Architecture Design for Deep Reinforcement Learning Applied to Precoders in MIMO Hybrid Beamforming Systems)相關論文 檔案 [Endnote RIS 格式]
[Bibtex 格式]
[相關文章]
[文章引用]
[完整記錄]
[館藏目錄]
至系統瀏覽論文 (2029-1-31以後開放)
摘要(中) 本篇論文將設計一個深度強化式學習之硬體架構,將其應用在混合式波束合成的系統架構上,對預編碼器進行神經網路學習。在已知通道狀態資訊的前提下,使用深度確定性梯度下降算法(Deep Deterministic Policy Gradient)來取得類比預編碼器,以類比預編碼器作為狀態,代理人採取動作後,從環境得到的反饋及獎勵,以獎勵的收斂趨勢來判斷訓練是否成功。以上述的算法,我們分析演算法中各項參數對訓練結果的影響,而以通道容量作為我們評斷其效能的標準,。選定各項神經網路之參數後,設計一硬體架構來實現深度確定性策略梯度下降於混合式預編碼系統(DDPG on Hybrid Precoding Algorithm),其架構由價值函數估計網路運算單元、價值函數目標網路運算單元、動作策略估計網路運算單元、動作策略目標網路運算單元、區域記憶體、開根號器與除法器,以及控制訊號所組成,每個價值函數估計網路與目標網路運算單元都包含16片價值函數網路運算片,而每個動作策略估計網路與目標網路運算單元都包含48片價值函數網路運算片,其運算片中都包含最小的運算處理單元,包含乘法器與加法器各4個。由相同的處理單原來設計DDPG,以正向傳遞來累積訓練神經網路所需要的參數,而以反向傳遞來更新網路參數,完成一次正向與反向傳遞各需要153與322個時脈。我們將此硬體設計於Xlinx VCU128上,操作頻率達95.2MHz,報告顯示使用了536514個LUT、256個BRAM、1024個DSP,在批量大小為1的時候,我們的每秒能產生之結果(IPS)為18332,而在批量大小為32的時候,我們的IPS為211556。而為了確認神經網路輸出與網路參數更新是否正確,與軟體上的位元仿真模型進行比較,其結果無誤差,以此表示此硬體可以對深度強化式學習進行訓練與更新。 摘要(英) A hardware architecture for deep reinforcement learning applied to the hybrid beamforming system is proposed. Given known channel state information, the algorithm employs deep deterministic gradient descent algorithm to calculate the phase of analog precoders. The previous analog precoder phases are used as states, and the agent generates actions based on feedback and rewards defined as channel capacity from the environment. The convergence trend of the channel capacity and the comparison to the conventional algorithms demonstrate the success of training.. A hardware architecture is then designed to implement DDPG with proper parameters. The architecture includes units for critic evaluation network units, critic target network, actor evaluation network, actor target network. Each critic evaluation network and target network contains 16 computation slices, and each actor evaluation and target network comprises 48 computation slices. Each computation slice includes basic processing elements, each having four multipliers and accumulators. Both forward and backward passes are executed in the proposed hardware architecture. For each forward and backward pass, 153 and 322 clock cycles are needed. The hardware design is synthesized on Xilinx VCU128, which achieves operating frequency of 95.2MHz, utilizing 536514 LUTs, 256 BRAMs, and 1024 DSPs. To confirm the correctness of neural network outputs and parameter updates, a comparison with a bit-true model shows no discrepancies, indicating that the hardware is capable of both training and inference for deep reinforcement learning. It can accomplish 211K and 18K training and inferences per second if the batch size is equal to 1 and 32, respectively. 關鍵字(中) ★ 深度強化式學習
★ 深度強化式學習應用於混合式預編碼系統
★ 混合式波束合成系統
★ 預編碼器
★ 硬體實作關鍵字(英) ★ Deep Reinforcement Learning
★ Deep Deterministic Policy Gradient
★ Hybrid Beamforming System
★ Precoder
★ Implementation論文目次 摘要 I
ABSTRACT II
目錄 III
表目錄 VI
圖目錄 IX
第一章 緒論 1
1.1 簡介 1
1.2 研究動機 1
1.3 論文組織 2
第二章 深度強化式學習(Deep Reinforcement Learning) 3
2.1 神經網路(Neural Network) 3
2.2 Q學習(Q Learning) 4
2.3 深度Q網路(Deep Q Network, DQN) 5
2.4 經驗回放(Experience Replay) 9
2.5 深度確定性策略梯度下降算法(Deep Deterministic Policy Gradient, DDPG) 10
第三章 應用於混合式預編碼系統之演算法 21
3.1 混合式波束合成系統(Hybrid Beamforming System) 21
3.2 傳統演算法 24
3.2.1 全數位預編碼器和結合器系統(Fully Digital Precoder and Combiner System) [5] 24
3.2.2 單一使用者混合式預編碼系統(Single-user Hybrid Precoding System) [5] 25
第四章 深度強化式學習應用於混合式預編碼系統 28
4.1 深度確定性策略梯度下降於混合式預編碼系統 28
4.2 深度確定性策略梯度下降於混合式預編碼演算法 32
4.3 不同參數設定之模擬結果分析 34
第五章 複雜度分析與硬體架構實現 47
5.1 複雜度分析 47
5.1.1 正向傳遞(Forward Propagation)之複雜度 48
5.1.2 反向傳遞(Backward Propagation)之複雜度 50
5.1.3 運算複雜度之分析比較 57
5.1.4 處理單元分配 60
5.2 硬體設計與實現 61
5.2.1 處理單元(Process Element, PE)設計 67
5.2.2 單精確度浮點數加法器設計 69
5.2.3 單精確度浮點數乘法器設計 71
5.2.4 單精確度浮點數除法器設計 72
5.3 資料流控制與排程(Preprocessing) 75
5.3.1 動作策略網路之正向傳遞資料流排程 75
5.3.2 價值函數網路之正向傳遞資料流排程 81
5.3.3 價值函數網路之反向傳遞資料流排程 88
5.3.4 動作策略網路之反向傳遞資料流排程 100
第六章 硬體實作結果與比較 116
6.1 不同精確度浮點數之模型模擬結果 116
6.2 FPGA合成與量測結果 118
6.3 綜合比較 123
第七章 結論 125
參考資料 127參考文獻 [1] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction”, The MIT Press, Cambridge, Massachusetts, London, England, 5 Nov. 2017.
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” in NIPS Deep Learning Workshop, 2013.
[3] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra and M. Riedmiller, “Deterministic Policy Gradient Algorithms,” in Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 387-395.
[4] E. Todorov, T. Erez and Y. Tassa, "MuJoCo: A physics engine for model-based control," 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012, pp. 5026-5033, doi: 10.1109/IROS.2012.6386109.
[5] 黃孝生, “多使用者多輸入多輸出系統下之混合式波束合成演算法與架構設計,” 碩士論文, 國立中央大學電機工程學系, 2018.
[6] A. Alkhateeb, R. W. Heath and G. Leus, "Achievable rates of multi-user millimeter wave systems with hybrid precoding," 2015 IEEE International Conference on Communication Workshop (ICCW), London, UK, 2015, pp. 1232-1237, doi: 10.1109/ICCW.2015.7247346.
[7] F. Sohrabi and W. Yu, "Hybrid Digital and Analog Beamforming Design for Large-Scale Antenna Arrays," in IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 501-513, April 2016, doi: 10.1109/JSTSP.2016.2520912.
[8] T. Lin and Y. Zhu, "Beamforming Design for Large-Scale Antenna Arrays Using Deep Learning," in IEEE Wireless Communications Letters, vol. 9, no. 1, pp. 103-107, Jan. 2020, doi: 10.1109/LWC.2019.2943466.
[9] A. M. Elbir, "CNN-Based Precoder and Combiner Design in mmWave MIMO Systems," in IEEE Communications Letters, vol. 23, no. 7, pp. 1240-1243, July 2019, doi: 10.1109/LCOMM.2019.2915977.
[10] Richard Bellman, “A Markovian Decision Process”, Journal of Mathematics and Mechanics , 1957, Vol. 6, No. 5 (1957), pp. 679-684.
[11] 黃柏銓, “基於深度強化學習之預編碼器設計於多輸入多輸出混合式波束合成系統進行波束追蹤,” 碩士論文, 國立中央大學電機工程學系, 2022.
[12] P. -Y. Tsai, Y. Chang and J. -L. Li, "Fast-Convergence Singular Value Decomposition for Tracking Time-Varying Channels in Massive Mimo Systems," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 1085-1089, doi: 10.1109/ICASSP.2018.8462248.
[13] 蔣詠軒, “深度強化式學習之客製化區塊浮點數硬體架構設計與優化,” 碩士論文, 國立中央大學電機工程學系, 2022.
[14] C. -W. Hu, J. Hu and S. P. Khatri, "TD3lite: FPGA Acceleration of Reinforcement Learning with Structural and Representation Optimizations," 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), Belfast, United Kingdom, 2022, pp. 79-85, doi: 10.1109/FPL57034.2022.00023.
[15] Y. Meng, S. Kuppannagari and V. Prasanna, "Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms," 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Fayetteville, AR, USA, 2020, pp. 19-27, doi: 10.1109/FCCM48280.2020.00012.指導教授 蔡佩芸(Pei-Yun Tsai) 審核日期 2024-1-30 推文 plurk
funp
live
udn
HD
myshare
netvibes
friend
youpush
delicious
baidu
網路書籤 Google bookmarks
del.icio.us
hemidemi
myshare