摘要: | 隨著通訊系統中基地台與使用者接收端天線數量日益提升,系統必須承受比以往更高的運算複雜度,因此多種預編碼(Precoding)技術不斷的被提出,目的是為了能夠減少接收端(receiver)複雜度,將複雜的運算集中至發送端(transmitter),使接收端能有降低成本、縮小面積及低功耗的優點。在多輸入多輸出無線通訊系統中,奇異值分解(Singular Value Decomposition, SVD)常用來計算發射端和接收端所採用之預編碼、波束成形(beamforming)以及解碼(decoding),有助於實現信號強化與干擾消除。本論文採用混冪式演算法(Hybrid Power Method, HPM)追蹤毫米波(mmWave)通道,其內部初始化階段使用自冪式演算法(Self-Power Method, SPM)取得初始奇異值,之後追蹤階段藉由自調整逆冪式演算法(Self-Adjusting Inverse Power Method, SA-IPM)每次迭代時自我調整,加快收斂速度追蹤奇異值。相較於SPM,SA-IPM具有優異的收斂速度且擁有更低的複雜度,同時能夠支援平行處理提高吞吐量(throughput)。硬體設計部分,核心架構QR分解以座標軸旋轉計數器(Coordinate Rotation Digital Computer, CORDIC) 配合脈動陣列(systolic array)實現硬體,可支援2×2至16×32的矩陣分解。對一個16×32通道矩陣進行奇異值分解,內部QR分解需474個時脈數,而完成一次奇異值分解共需616個時脈數。以TSMC 40nm製程設計晶片,最高操作頻率為143MHz,未平行處理下吞吐量每秒可分解232K個奇異行向量(vector/s),功率消耗66.4mW;倘若QR分解器增加為3個並採用平行處理,吞吐量可達每秒分解904K個奇異行向量(vectors/s)。;Due to the increasing antenna number at the base station and user terminal, higher computational complexity is induced. To reduce the complexity of receiver, there are many precoding techniques are proposed to achieve lower cost, smaller area and lower power. In the multiple input multiple output (MIMO) wireless communication systems, singular value decomposition (SVD) generate precoding, beamforming and decoding matrix at transmitter and receiver [1]. It can enhance the signal concentration and remove interference. In this thesis, hybrid power method (HPM) is used for SVD to track mmWave channel. In the initialization phase, initial singular value is obtained by the self power method (SPM). And then in the tracking phase, self-adjusting is utilized in each iteration by self-adjusting inverse power method (SA-IPM) to track singular value. Compare to SPM and SA-IPM, SA-IPM not only has excellent convergence and low complexity but also gets higher throughput in parallel processing. In SA-IPM hardware design, the core architecture is QR decomposition, which is realized by Coordinate Rotation Digital Computer and systolic array. It can support a matrix size form 2×2 to16×16. To decompose a 16×16 channel matrix, QR decomposition takes 474 clocks, and it needs 616 clocks for a singular value decomposition. Through the TSMC 40nm process, the highest clock operating frequency reaches 143MHz. Throughput is 232K column vector per second without parallel processing and power consumption is 66.4mW. If three QR decomposition is used and parallel processing is considered, throughput can be increased to 904K column vector per second. |