dc.description.abstract | There are many attractive features for the upcoming video coding standard MPEG-4 AVC/JVT/H.264. However, the attractive features within H.264 are hard to design for real-time implementation. In previous works, most researches focus on the achievable specification such as mobile TV and HDTV. They concentrate about how to meet the video specification and memory bandwidth. Generally, there are two solutions to achieve the targets. One is Application Specification Integrated Circuits (ASIC), and the other one is using pure processor such as ARM or DSP.
In this thesis, we propose high efficient Motion Estimation (ME) and deblocking filter architecture design using on power aware H.264 encoder. In the first part, we propose a speed-improve ME algorithm and a high efficiency architecture design. ME in H.264 employs seven permitted block sizes to improve the rate-distortion performance. This novel feature achieves significant coding gain over coding a macroblock (MB) using the fixed block size. However, ME is computationally intensive with the complexity increasing linearly to the number of the allowed block sizes. A high performance hybrid ME algorithm for H.264/AVC is proposed first. The hybrid ME algorithm used the proposed mode decision algorithm, Edge Information Mode Decision (EIMD), which is used the edge information to decide the best block mode of the seven modes and combining with the proposed Predict Hexagon Search (PHS). By using the proposed ME algorithm, the computational complexity has a huge reduction and thus it is suitable for high resolution applications such as HDTV (1920×1080) or QFHD (3840×2160). For the tested three real movies, the proposed algorithm can speedup about 300~405 times comparing with the full search of JM10.2. Compared with other popular fast algorithms, the proposed algorithm can has about 3~247 times of speedup ratio. After the ME algorithm is developed, an architecture for a combined fast motion estimation algorithm with the PHS and the EIMD is proposed. The proposed architecture applies a large search range and low operation frequency as compared with other popular ME architectures. The proposed architecture only needs 19.4 MHz operating frequency to achieve real time execution for the general specification of the SDTV (720×480) used with four reference frames and the search range of 256×256. The proposed architecture only requires 116.6 MHz operating frequency to achieve real time execution for the ultra high specification of the QFHD (3840×2160) used with one reference frame and the search range of 256×256. The gate count of the proposed architecture is 300K, and the memory usage is 12.6KB.
Second, we propose a new processing order and the architecture design for deblocking filter. The proposed processing order, double-cross processing order, is effectively constructed by a parallel flow to improve processing speed and reduce memory access. Moreover, the proposed architecture can save about 38-80% of memory access as compared with other designs. Based on this high efficient architecture, the processing performance can be enhanced, and the operation frequency for standardized video specifications can be reduced. For the general video specification HDTV1080p (1920?1080 @30fps), the operation frequency of the proposed architecture is only 11.5 MHz. For the high resolution QFHD specification (3840?2160 @30fps), the operation frequency of the proposed architecture is only 46.6 MHz. The implementation result is about 20.14K gates, and the memory requirement is 64?32 bits. The power dissipation for QFHD specification is 7.7 mW at 46.6 MHz operating frequency. For the whole H.264 encoder, we propose a HW/SW co-design scheme which uses our pervious proposed ME and deblocking filter machines. At final, we propose a power aware scheme for the whole H.264 encoder. The proposed power aware design can save about 9%~87% of power consumption while the power budget is used.
| en_US |