摘要: | 目前,大多數電腦系統皆建構於馮紐曼架構(Von Neumann Architecture, VNA)上,其中中央處理器(CPU)與記憶體透過資料與控制通道進行溝通。CPU負責處理儲存在記憶體中的指令,而記憶體則同時存放資料與程式。然而,對於像人工智慧(AI)及深度神經網路(DNN)等資料密集型應用,大量資料在CPU與記憶體間的傳輸,形成了主要的瓶頸。這是因為CPU常因等待記憶體回應而停滯,進而限制了系統效能。為了解決此問題,提出了記憶體內運算(Computing-In-Memory, CIM)的概念。CIM將運算功能整合至記憶體內部,大幅減少資料傳輸,提升整體效能與速度。 AI應用需要大量的乘加運算(Multiply-and-Accumulate, MAC)來有效處理龐大資料集。在各種CIM技術中,基於SRAM的CIM因能以較低能耗執行多位元MAC運算而受到廣泛關注。根據運算方式的不同,基於SRAM的CIM可分為類比式CIM(ACIM)與數位式CIM(DCIM)。相較於ACIM,DCIM雖然會犧牲部分速度與能效,但具有更。因此,本研究以DCIM為主要探討對象。 為確保DCIM系統的品質與可靠度,有效的測試流程是不可或缺的。由於DCIM包含記憶體和運算單元,相較於傳統記憶體,其測試變得更加複雜,必須同時處理記憶體故障與運算相關故障。現有的故障模擬器多著重於讀寫行為,並未針對運算相關的故障設計,而這類故障可能涉及複雜的輸入輸出行為,無法僅透過簡單的記憶體存取觀察。因此,能有效支援DCIM系統的故障模擬技術極為重要。為此,本研究提出一套新的故障模擬器,不僅能支援傳統記憶體故障模型,亦能完整涵蓋DCIM特有的運算故障。我們的方法不依賴預先定義的故障基元(fault primitives),而是透過靈活的測試模式來描述操作行為,涵蓋讀寫動作及運算輸入輸出,以觸發與偵測各類型故障。 為提升模擬效率,我們設計了一個樹狀結構,將故障模型與其對應的測試模式組織成可快速搜尋的結構。在模擬過程中,測試操作會與樹進行匹配,從而能夠同時檢測多個故障,而非逐一檢查,大幅減少重複檢驗並加速整體流程。此外,我們也於樹的建構過程中導入多執行緒技術,以提升在處理大量故障模型時的效能,這對於複雜DCIM設計尤為重要。實驗結果顯示,所提出的模擬器在傳統記憶體故障模型上達成了與既有研究相符的故障覆蓋率。 ;Currently, most computer systems are built on the von Neumann architecture (VNA), where a CPU and memory communicate through data and control channels. The CPU processes instructions stored in memory, while memory holds both data and programs. However, for data-heavy applications like AI and deep neural networks (DNNs), moving large volumes of data between CPU and memory creates a major bottleneck. This is because the CPU often stalls while waiting for memory responses, which limits system performance. To overcome this issue, computing in memory (CIM) has been proposed. CIM integrates computation directly into memory, reducing data movement and boosting overall efficiency and speed. AI applications demand a massive number of Multiply-and-Accumulate (MAC) operations to process large datasets efficiently. Among the different types of CIM technologies, SRAM-based CIM has drawn significant interest for its ability to perform multibit MAC operations with lower energy consumption. Based on the computation method, SRAM CIMs can be divided into Analog CIM (ACIM) and Digital CIM (DCIM). Compared to ACIM, DCIM offers better precision and is more resilient to noise, though it typically sacrifices some speed and energy efficiency. Hence, the focus of our work is on DCIM. To ensure the quality and reliability of DCIM systems, effective testing is critical. Since DCIM includes both memory and computing units, its testing becomes more complex compared to conventional memory, requiring the detection of both memory faults and computation-related faults. Existing fault simulators mainly focus on read and write faults and are not designed to catch computing-related issues, where faults may involve complicated input-output behaviors that simple memory access can′t reveal. To address this gap, we propose a new fault simulator that fully supports both conventional memory faults and computing faults unique to DCIM. Instead of relying on predefined fault primitives, our method uses flexible test patterns that describe not only read/write actions but also computing inputs and outputs, allowing a broader range of faults to be activated and detected. To improve simulation efficiency, we design a tree-based structure that organizes fault models and their test patterns into a searchable form. During simulation, the test operations are matched against the tree, so multiple faults can be checked at once instead of one by one, greatly reducing redundancy and speeding up the process. We also apply multi-threading when building the tree to handle large numbers of fault models more efficiently, which is especially important for complex DCIM designs. In our experiments, the simulator achieved fault coverage results consistent with prior research on conventional memory faults . |