摘要: | MPEG-4 AVC/JVT/H.264具有許多有趣的特性。然而這些H.264的特性造成H.264在實現即時壓縮上是困難的。在之前的一些設計中,大部分的設計偏重於實現手機電視和高畫質電視。這些設計主要在探討如何達到規格及節省記憶體頻寬。一般來說有兩種方式達到目標。一種是專用集成電路(ASIC)而另一種是利用處理器如ARM或DSP 。 在這篇論文中我們提出運用於功率感知H.264編碼器的高效率動態評估與去區塊濾波器的架構設計。第一個部分我們提出一個加速動態評估的演算法和一個高效率的動態評估硬體。動態評估在H.264中提供七種不同的區塊大小來改善其編碼率失真的效能。這個新的技術比起應用固定尺寸可得到更好的效率。然而動態評估的運算量是隨著可用區塊大小的數量直線上升。所以我們先提出一個高效率複合式的動態評估演算法。這個複合式的動態評估演算法使用我們所提出的區塊決定演算法,這個區塊決定演算法是利用邊界資訊來從七個區塊尺寸中決定最好的尺寸,並且與我們所提出的預測六角型演算法做結合。當使用這個演算法,運算量可大幅的減低也因此是適合使用在像HDTV或QFHD的規格上。當我們利用實際的電影測試時,跟JM10.2的全域搜尋比較可提升300~405倍。跟一般的快速演算法比較大概可以快上3~247倍。當動態評估的演算法完成後,我們提出一個結合我們演算法的高效率動態評估硬體。跟別的硬體比較,這個硬體可提供較大的搜尋區域和較低的功率。我們所提出的硬體在達到SDTV即時壓縮及4張參考圖片的情況下只需要19.4MHz。在達到QFHD即時壓縮及一張參考圖片的情況下只需要116.6MHz。我們所提出的硬體的大小為300K個閘,而記憶體的使用量為12.6KB。 第二步,我們為去區塊效應濾波器提出一個新的處理程序和硬體。這個新的處理程序是以平行處理來建構可加速處理時間和減少記憶體存取。與別的硬體比較,我們所提出的硬體可以節省大約38~80%的記憶體存取。基於這個高效率的硬體,處理的效能可以改進並且可以降低在標準壓縮格式下的操作頻率。對於HDTV的格式操作頻率只需要11.5MHz。對於高解析度的QFHD,我們所提出的硬體的操作頻率只需要46.6MHz。我們所實現的結果需要約20.14K個閘,而記憶體的使用量約為64?32 bits。在46.6MHz的操作頻率下功率的消耗約為7.7mW。對於整個H.264編碼器,我們提出一個軟硬體整合的概念並整合使用我們前面所提出的硬體。最後我們針對整個H.264編碼器提出一個功率感知的演算法。這個功率感知的演算法跟原始的功率消耗比較可以節省大約9~87%的功率消耗。There are many attractive features for the upcoming video coding standard MPEG-4 AVC/JVT/H.264. However, the attractive features within H.264 are hard to design for real-time implementation. In previous works, most researches focus on the achievable specification such as mobile TV and HDTV. They concentrate about how to meet the video specification and memory bandwidth. Generally, there are two solutions to achieve the targets. One is Application Specification Integrated Circuits (ASIC), and the other one is using pure processor such as ARM or DSP. In this thesis, we propose high efficient Motion Estimation (ME) and deblocking filter architecture design using on power aware H.264 encoder. In the first part, we propose a speed-improve ME algorithm and a high efficiency architecture design. ME in H.264 employs seven permitted block sizes to improve the rate-distortion performance. This novel feature achieves significant coding gain over coding a macroblock (MB) using the fixed block size. However, ME is computationally intensive with the complexity increasing linearly to the number of the allowed block sizes. A high performance hybrid ME algorithm for H.264/AVC is proposed first. The hybrid ME algorithm used the proposed mode decision algorithm, Edge Information Mode Decision (EIMD), which is used the edge information to decide the best block mode of the seven modes and combining with the proposed Predict Hexagon Search (PHS). By using the proposed ME algorithm, the computational complexity has a huge reduction and thus it is suitable for high resolution applications such as HDTV (1920×1080) or QFHD (3840×2160). For the tested three real movies, the proposed algorithm can speedup about 300~405 times comparing with the full search of JM10.2. Compared with other popular fast algorithms, the proposed algorithm can has about 3~247 times of speedup ratio. After the ME algorithm is developed, an architecture for a combined fast motion estimation algorithm with the PHS and the EIMD is proposed. The proposed architecture applies a large search range and low operation frequency as compared with other popular ME architectures. The proposed architecture only needs 19.4 MHz operating frequency to achieve real time execution for the general specification of the SDTV (720×480) used with four reference frames and the search range of 256×256. The proposed architecture only requires 116.6 MHz operating frequency to achieve real time execution for the ultra high specification of the QFHD (3840×2160) used with one reference frame and the search range of 256×256. The gate count of the proposed architecture is 300K, and the memory usage is 12.6KB. Second, we propose a new processing order and the architecture design for deblocking filter. The proposed processing order, double-cross processing order, is effectively constructed by a parallel flow to improve processing speed and reduce memory access. Moreover, the proposed architecture can save about 38-80% of memory access as compared with other designs. Based on this high efficient architecture, the processing performance can be enhanced, and the operation frequency for standardized video specifications can be reduced. For the general video specification HDTV1080p (1920?1080 @30fps), the operation frequency of the proposed architecture is only 11.5 MHz. For the high resolution QFHD specification (3840?2160 @30fps), the operation frequency of the proposed architecture is only 46.6 MHz. The implementation result is about 20.14K gates, and the memory requirement is 64?32 bits. The power dissipation for QFHD specification is 7.7 mW at 46.6 MHz operating frequency. For the whole H.264 encoder, we propose a HW/SW co-design scheme which uses our pervious proposed ME and deblocking filter machines. At final, we propose a power aware scheme for the whole H.264 encoder. The proposed power aware design can save about 9%~87% of power consumption while the power budget is used. |