一個適用於量化深度神經網路且可調整精確度的處理單元設計: 一種階層式的設計方法

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	徐麒惟	zh_TW
DC.creator	Chi-Wei Hsu	en_US
dc.date.accessioned	2021-10-26T07:39:07Z
dc.date.available	2021-10-26T07:39:07Z
dc.date.issued	2021
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=108521049
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	卷積神經網路 (Convolution Neural Networks, CNN)在現今發展得十分迅速，主要使用在影像辨識、自駕車、物件偵測……等等。當我們應用CNN時，精準度以及資料大小是兩個重要的指標來計算效能以及運算效率。在傳統的CNN網路中，大部分都是以浮點數32bits來做計算以保持高水平的精準度。然而，要使用浮點數32bits運算必須用到32bits的乘加器 (MAC)，這樣除了會在運算效率上造成瓶頸之外，還會使功耗大幅的上升，因此現今的研究者都在是利於找出減少資料量以此為加速的方法。量化(Quantization)是其中一種可以在精準度不下降太多的情況下來降低資料量已獲得加速的好處以及減少運算複雜度的一個方法。在CNN網路中，每次層所需要的位元數都不盡相同，而為了權衡更好的運算效率及精準度，不同的位元的運算會用在CNN網路的不同層數中，以增加運算效率。在以上的前提下，可以調整位元數的運算單元(Processing Element, PE)可以支援不同元位元的運算，像是 8bits x 8bits、 8bits x 4bits、4bits x 4bits以及2bits x 2bits。而我們所提出的這個架構屬於階層式的架構，這樣可以在設計過程中減少一些多餘的硬體，降低整體晶片的面積，而為了提升運算速度，我們提出的8bits x 8bits PE 可以做到兩級的平行化。而在實驗的部分，我們採用90nm的製程，從實驗結果中我們可以發現，跟先前的論文相比，我們2bits x 2bits面積可以減少57.5% - 68%，而在8bits x 8bits PE中，使用平行化架構可以讓8bits x 8bits的運算速度跟4bits x 4bits PE的運算速度相當。	zh_TW
dc.description.abstract	In deep learning field, Convolution Neural Networks (CNNs) have been achieved a significant success in many fields such as visual imagery analysis, self-driving car, respectively. However, data size and the accuracy of each system are the major target to estimate the efficient and effective computations. In conventional CNN models, 32bits data are frequently used to maintain high accuracy. However, performing a bunch of 32bits multiply-and-accumulate (MAC) operations causes significant computing efforts as well as power consumptions. Therefore, recently researchers develop various methods to reduce data size and speed up calculations. Quantization is one of the techniques which reduces the number of bits of the data as well as the computational complexity at the cost of accuracy loss. To provide better computation effort and accuracy trade-off, different bit number may be applied to different layers within a CNN model. Therefore, a flexible processing element (PE) which can support operations of different bit numbers is in demand. In this work, we propose a hierarchy-based reconfigurable processing element (PE) structure that can support 8bits x 8bits, 8bits x 4bits, 4bits x 4bits and 2bits x 2bits operations. The structure we propose applies the concept of hierarchical structure that can avoid the redundant hardware in the design. To improve the calculation speed, our 8bits x 8bits PE applies two stage pipelines. The experimental results with 90nm technology show that in 2bits x 2bits PE, we can save the area by 57.5% to 60% compared to a Precision-Scalable accelerator. In the 8bits x 8bits PE, the two-stage pipelines can maintain almost the same calculation speed of the 4bits x 4 bits PE.	en_US
DC.subject	量化神經網路	zh_TW
DC.subject	運算單元	zh_TW
DC.subject	可重組式設計	zh_TW
DC.subject	Quantized Neural Networks (QNN)	en_US
DC.subject	Processing Element (PE)	en_US
DC.subject	Reconfigurable Design	en_US
DC.title	一個適用於量化深度神經網路且可調整精確度的處理單元設計: 一種階層式的設計方法	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	A Precision Reconfigurable Process Element Design for Quantized Deep Neural Networks: A Hierarchical Approach	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 108521049 完整後設資料紀錄