dc.description.abstract | Currently, the von Neumann architecture (VNA) is the fundamental structure of computer systems, consisting of a Central Processing Unit (CPU) and Memory, connected by data channels and control signals. The CPU executes instructions stored in memory, while memory is used to store instructions and data. However, for data-intensive applications such as image classification, speech recognition, and natural language processing, large amounts of data are transferred between memory and computing cores, leading to the emergence of von Neumann bottlenecks. This is due to the communication speed limitation between the CPU and memory in this structure, causing the CPU to wait for memory responses, thereby limiting the overall system performance.
To address the von Neumann bottleneck, attention has shifted towards Computing In-Memory (CIM), seen as a promising solution. This approach moves computational functions into memory, allowing computation and data processing to occur in the same place, thereby reducing the communication demands between the CPU and memory to improve system efficiency and performance. Many researchers have proposed various CIM architectures to accelerate AI computation. Broadly, CIM computation can be divided into two types: analog computing and digital computing. In recent years, analog CIM has received widespread attention due to its inherent advantages in high parallelism and energy efficiency. Therefore, the focus of our work is on analog CIM architectures. Among various types of memory, SRAM (Static Random-Access Memory) and RRAM (Resistive Random-Access Memory) stand out as popular choices.
SRAM-based CIM architectures have proven successful due to their mature and stable technology, demonstrating efficient and reliable computation with mature device processes. However, the relatively larger unit area and lower storage density of SRAM cells lead to increased chip area requirements. In contrast, CIM architecture based on RRAM offers advantages such as high density, low power consumption, non-volatility, and seamless integration with CMOS processes. However, they face challenges related to process yield differences, resulting in various types of faults. While both CIM architectures significantly improve computational speed, they each have their own advantages and disadvantages.
To fully leverage the advantages of different CIM architectures, we propose a novel hybrid SRAM-RRAM CIM architecture that enables direct in-place computation of weights stored in the memory array. This is achieved through a specially designed peripheral circuit integrating SRAM and RRAM structures. Additionally, we introduce a novel weight allocation strategy, termed the Weight Storage Strategy (WSS), which appropriately distributes weights based on the importance of their Most Significant Bits (MSBs) and Least Significant Bits (LSBs) into different memory arrays. The MSBs of weights have a greater impact on computations, so we store them in the relatively stable SRAM array, while the LSBs, which typically have more bits and are relatively less critical, are stored in the smaller RRAM array. Ultimately, experimental results demonstrate that our architecture surpasses 8T-SRAM-based CIM architectures by approximately 35%, 40%, and 50% in terms of area, leakage power, and energy consumption, respectively. At the same time, in terms of reliability, it is also better than the RRAM-based architecture by about 32% and 18% when evaluated using MNIST and hand detection datasets. | en_US |