摘要: | 在巨量多輸入多輸出系統中,最小均方誤差(MMSE)演算法在上行信號偵測扮演著非常重要的角色因為它能夠使效能逼近最佳解並且降低複雜度(線性)。然而,演算法中的反矩陣運算複雜度會隨著使用者的增加而變得很大,硬體實現上也變得很困難。本論文提出了一個降低運算複雜度和節省硬體資源,且可平行運算的管線式架構的最小均誤差偵測器,並且加入了通道編碼和解碼器,產生軟性解調輸出來增加位元錯誤率之效能。一開始,前處理單元(PU)會先計算演算法中的格拉姆矩陣的值(Gram matrix)和匹配濾波器(Matched Filter)的輸出值,文獻中,會使用下三角(或上三角)的脈動陣列(Systolic array)來計算,利用此法的好處是運算完的格拉姆矩陣的值不需要存起來,直接灌到下一級使用,但是需要非常多的處理元件來做運算。此外,因為格拉姆矩陣為對稱矩陣,因此矩陣的右下角(或左上角)的值是不需要計算的,這樣非常浪費運算資源。本論文根據矩陣的對稱性,使需要計算的格拉姆值達到最低,大大地節省了硬體資源。接著會進入第二級來進行解碼,使用的是雅可比疊代(Jacobi iteration),因其能夠平行處理而不需要相繼處理訊號。最後再計算軟性輸出值。硬體實現上先利用 SMIMS VeriEnterprise Xilinx FPGA進行即時驗證電路功能,接著晶片實現上利用 90 nm 製成來設計晶片,晶片的核心面積為 3.34 mm^2,最高操作頻率為 327 MHZ 且動態功率消耗為 47.3 mW 。;The minimum-mean-square-error (MMSE) plays a significant role in the massive multiple-input-multiple-output (MIMO) system uplink signal detection. However, matrix inversion computing complexity of the MMSE algorithm increases largely when the number of users is high, and hardware implementation is difficult. This thesis proposes a reducing complexity, frugal hardware resource, parallel processing and pipelining architecture MMSE detector. The channel encoder and decoder are used to improve bit error rate by soft output value. Firstly, Preprocessing units (PU) are used to calculate Gram matrix and Matched Filter output of the algorithm. In the literature, the lower-triangular (or upper-triangular) systolic array are proposed, the benefit of this architecture is that the value of the Gram matrix don’t need to store and then output to next stage directly, but needs more processing element for computing. In addition, since gram matrix is symmetric, the value of the Gram matrix bottom right corner (or upper left corner) don’t need to be calculated, and then the computing complexity can be reduced. According to symmetric of the matrix, this thesis reduces the computation of gram matrix value which only need to compute lowest. Hence, we can save hardware resource highly. Secondly, Jacobi iteration is used to decode in the second stage because it could process in parallel instead of sequential processing signal to calculate the soft output value. Finally, this design is verified on SMIMS VeriEnterprise Xilinx FPGA, and the proposed design is implemented in 90 nm CMOS technology. The core area is 3.34 mm^2, maximum clock frequency is 327 MHz, and dynamic power consumption is 47.3 mW. |