近代科技以爆炸性的速度在成長,發展出了如人工智慧(Artificial Intelligence)、5G網路(fifth Generation Mobile Networks)等等,而這些技術的資料傳輸量又更高了一些,多輸入多輸出系統已經漸漸無法負荷,於是又將天線數量向上提升形成了巨量多輸出多輸入系統(Massive MIMO system),常見的天線數有64x8、128x8 (N_r×N_t) 等等,而本論文嘗試以較為困難的128x32的天線數量進行實現,在計算最小均方誤差演算法(Minimum Mean Square Error)時,演算法中的反矩陣複雜度為O(N_t^3),故本論文以共軛梯度雅可必法(Joint Conjugate Gradient Jacobi Method)去逼近MMSE中的反矩陣,此演算法前面以共軛梯度法進行迭代後面以雅可比法進行收尾,因共軛梯度法能提供較好的收斂性,所以以它先搜尋正確的方向,後面再以複雜度較低的雅可比法進行迭代,雅可比法是透過對角矩陣和剩餘矩陣進行迭代的演算法,因Massive MIMO有著對角優勢,雅可比法的迭代也能降低系統的位元錯誤率(Bit Error Rate),所以將兩種演算法結合之後,獲得的位元錯誤率幾乎等同於所要逼近的演算法MMSE。 而在電路架構上為了運算格拉姆矩陣(Gram matrix),將通道矩陣分為四等分並利用格拉姆矩陣的共軛性與對稱性進行平行處理最後相加以獲得更高的吞吐量(throughput),整體的電路架構以管線化(pipeline)進行設計,將其分為三個stage進行運算,晶片時做採用的是TSMC的40nm製程,核心面積為3.72mm^2,操作頻率約423MHz,功耗約為877mW,傳輸量可以達到768Mbps的速度。 ;Modern technology is growing at an explosive rate, that developed such as AI(artificial intelligence),5G network(fifth Generation Mobile Network),etc. The data throughput of these technologies is a bit higher, MIMO systems have been gradually unable to load. Therefore, the number of antennas is increased to form a Massive MIMO system. Common antenna numbers are 64x8、128x8 and so on. However, this paper tries to implement it with the more difficult number of antennas which is 128x32. When calculating the MMSE (Minimum Mean Square Error), the complexity of the inverse matrix in the algorithm is O(N_t^3). Therefore, this paper uses the Joint Conjugate Gradient Jacobi algorithm to approximate the inverse matrix in MMSE. This algorithm uses conjugate gradient method to iterate before and ends with the Jacobi method. Because the CG method can provide better convergence, so use it to search the correct direction first. Later, iterate with the less complex Jacobi method. The Jacobi method is an iterative algorithm through the diagonal matrix and residual matrix. Because Massive MIMO has diagonal advantage, the iteration of the Jacobi method can also reduce the BER(Bit Error Rate) of the system. In the circuit architecture, in order to operate the Gram matrix, this paper divides the channel matrix into quarters and use the conjugation and symmetry of the Gram matrix for parallel processing and finally sum to obtain higher throughput. The overall circuit architecture is designed with pipeline and divided into three stages for operation. The chip design is implemented in TSMC 40 nm CMOS technology. The core area is 3.72 mm^2,maximum frequency is 423MHz, dynamic power consumption is 877mW, throughput can achieve 768Mbps.