計算式記憶體(Computing-in-Memory, CIM)被視為解決資料密集型運算的有效方案。本論文提出一種基於SRAM的數位CIM(DCIM)systolic架構,用於深度神經網路(DNN)運算。該架構由具多個處理單元(PU)的管線化計算單元(CU)組成。我們開發了一套自動化設計框架,能在延遲、頻率與頻寬等限制下,探索systolic陣列的深度與寬度分割,以最小化晶片面積。核心概念是透過對部分和(PSUM)加法樹進行分割以縮短關鍵路徑,從而提升時脈頻率並減少所需PU數量。雖然更深的分割會增加PU面積,但框架能找到最佳化的折衷點。與TSMC基準設計相比,本架構在AlexNet與VGG16運算下可同時降低約15%的功耗與面積,且不增加延遲,展現出相較於傳統架構更高的能效與面積效率。;Computing-in-memory (CIM) has emerged as a promising solution for data-intensive workloads. This thesis proposes a systolic SRAM-based digital CIM (DCIM) architecture for deep neural networks (DNNs), consisting of pipelined computing units (CUs) with multiple processing units (PUs). We develop an automated design framework that optimizes depth and width partitioning of the systolic array to minimize silicon area under constraints of latency, frequency, and bandwidth. The key idea is to partition the partial-sum adder tree to shorten critical paths, enabling higher clock rates and reducing the required PU count. While deeper partitioning increases PU area, our framework identifies the optimal trade-off for each workload. Compared with a TSMC baseline design, the proposed architecture achieves ~15% reductions in both power and area without latency loss on AlexNet and VGG16, demonstrating improved efficiency over conventional approaches.