摘要: | 隨著半導體製程科技的持續進步,現今的晶片已可以在很小的面積中, 進行複雜的資料處理與運算。然而,隨著晶片面積的縮小,老化效應對於晶 片的可靠度造成了很大的威脅。其中,負偏壓溫度不穩定性(NBTI) 為影響 最為嚴重的老化效應之一。伴隨著晶片長時間的使用,NBTI 會使 P 型電晶 體的閥值電壓慢慢上升,進而導致其訊號傳遞速度延遲。若此訊號傳遞延遲 超出原本所制定的規格,將有可能運算結果發生錯誤,進而導致晶片的功能 性錯誤。為了避免此現象的發生,我們需要在晶片運行期間得知晶片的老化 狀況,以進行即時的校正處理。因此,在過去的研究中,已有研究者提出了 在晶片中放置老化偵測器,以便進行即時監測,並在老化發生時,做出適當 的處置來避免功能性錯誤的發生。然而,由於擺置偵測器會增加晶片面積及 功耗,因此一個晶片中能擺置的偵測器數量是有限的。在過去的研究中,大 多數的研究者僅專注於將偵測器放在電路延遲最長的關鍵路徑(critical path) 上,以確保能偵測最差狀況(worst case)的訊號傳遞延遲。然而,關鍵路徑會 隨著晶片的老化過程而有所改變,若以晶片健康時之時序分析結果為依據, 來決定關鍵路徑並進行偵測器擺放,則可能在晶片老化後,無法準確地反映 晶片最嚴重的老化狀況。為了讓偵測器放置的位置更精確,其中一個做法便 是在設計時(design time)針對不同的老化程度(aging situation)進行模擬 (simulation),並透過模擬後的結果得到不同老化狀態下的關鍵路徑,並以此 為依據來進行偵測器的布局。雖然這樣所得到的布局結果能準確的反應電 路老化狀況,但對電路進行精確的老化模擬將耗費大量時間。因此,這樣的 方法將無法被使用現在今常見的大型電路中。 為了解決上述的問題,在此論文中,我們提出一了使用機器學習的老化 偵測器佈局架構,來有效率的進行老化偵測器之佈局。在我們提出的架構中, III 雖也是依據不同老化程度的模擬結果進行老化偵測器之布局,但我們透過 生成對抗網路(GAN)在短時間中產生大量的老化模擬結果,以取代冗長的模 擬過程,來大幅度減低上述大量的老化模擬所需的時間。為了能讓老化資訊 與 GAN 進行互動,我們開發了資料轉換方法,讓老化資訊能被圖像化並作 為 GAN 的訓練資料,而經由適當訓練後的 GAN 所產生的輸出,亦能透過 我們的轉換方法適當的逆轉換為老化資訊。最後,我們提出一老化偵測器之 布局演算法,透過適當的使用逆轉換後的老化資訊進行布局。實驗結果顯示, 我們的方法除了將能精確的進行布局偵測器,以成功偵測老化後的時序錯 誤外,並透過機器學習的方式,大幅度減低上述大量的老化模擬所需的時間。 我們的偵測器佈局方法可以達到最高 100%時序錯誤偵測率,並且相比於其 他的老化偵測器部屬方法,我們可提升 30.77%的時序錯誤偵測率。更重要 的是,我們透過大量減少老化模擬的時間,來讓偵測器部屬更有效率,與之 前的研究相比,最多可以加速 330 倍的時間。 ;With the continuous shrinking of CMOS technologies, even a single IC can perform complex computations in a tiny chip area. However, along with the downscale of the circuit area, aging effects become a non-negligible reliability threat. Amount all aging effects, Negative Bias Temperature Instability (NBTI) is one of the most serious agine effects in nanoscale technology. The NBTI will increases the threshold voltage of pMOS transistors along with the continuous “ON” stress, and therefore potentially increase the propagation delay. If the propagation delay on a critical path violates the timing requirements in the specification, it may lead to timing failure or even malfunction. In order to avoid the unacceptable situation, it is important to monitor the aging situation during circuit operation and provide necessary calibrations. Therefore, in previous works the concept of using aging sensors to provide real-time monitoring as well as the applying appropriate tolerance mechanism when the aging occurs has been proposed. However, the number of aging sensors can be placed in a chip is limited due to the area overhead. In the previous works, aging monitors are usually deployed on the end of the critical paths to ensure the worst-case aging situation can be successfully captured. However, the critical path may vary after circuit aging. Simply deploying aging sensors with respect to the critical paths obtained from health circuit analysis may be unable to reflect the real aging situation. One of the possible approaches to accurately deploy the aging sensors is to perform detailed aging simulation under different aging situation at design time, and figure out the potential critical paths under different aging situations. After that, the aging sensors are deployed based on the above information. Although the proposed approach can successfully catch the aging situation, the unacceptable simulation time makes the method impractical for larger circuits. Therefore, an efficient aging sensor deployment methodology is in demand. V To solve the above problem, in this dissertation, we propose a machine learning based aging monitor deployment framework to efficiently deploy aging sensors. In out framework we employ the similar concept that deploying aging sensors based on detailed aging simulation under different aging situations, but we apply Generative Adversarial Network (GAN) which replaces tedious detailed simulations and generates a large amount of simulations results to significantly reduce the execution time. To translate the aging information to and out of GAN, we propose a data transform method to image the aging information back and forth. Finally, we propose an aging sensor placement algorithm based on the aging information provided by GAN. Experimental results show that our framework can efficiently and accurately deploy aging sensors by reaching 100% timing failure detection rate, a 30.77% improvement compare to a previous work. Moreover, a 330x speed up can also be conducted compare to a previous work. |