摘要: | HEVC高效率視頻編碼的編碼結構中,相較於之前的影像壓縮標準,將CTU的大小從最大的64x64切割至8x8的尺寸,降低了編碼單元的位元率,但也增加了計算的時間成本。因此,本研究在第三章提出以卷積神經網路 CNN為主的分散式影像編碼架構應用於HEVC的編碼端和解碼端,來簡化編碼過程的複雜度並在解碼端後處理時提高影像品質。 在編碼端,我們對SVM-CNN CU/PU的演算法進行優化,將原本的差值濾波器替換為雙線性插值濾波器,以減少計算量也節省編碼時間。然而簡化小數點估算導致畫面失真。因此我們使用CNN對畫面進行後處理改善,使BDBR%降低到0.43%,TS%增加到74.42%。而在解碼端,我們加入DenseNet/DAE三通道CNN模型對解碼影像進行畫面增強,使BDBR%能降到-5.96%。 在第四章中,我們探討了如何調整編碼端演算法中的閥值,在影像品質改善的限制下最佳化省時率。透過對SAD和RDO等判別特徵的閥值將其依比例進行調整,得到ΦSAD和ΦRDO分別與BDBR%和TS%的空間分佈。為了更精確預測,我們在BDBR% =6.0%附近進行了更多的實驗。得到的ΦSAD和ΦRDO之間的關係式並帶入ZBDBR和ZTS的曲面函數,計算出我們預測的最佳省時率。最後,我們根據BDBR%和TS%的關係曲線進行預測。結果顯示,預測值和實驗結果的誤差在可接受的範圍內。因此未來我們可以透過調整閥值來優化編碼端的計算,進一步預測出編碼端的BDBR%和TS%效能表現。 ;In the coding structure of HEVC, compared to previous image compression standards, the size of Coding Tree Units (CTU) has been reduced from a maximum of 64x64 to 8x8, lowering the bit rate of encoding units but increasing the computational time cost. Therefore, in this study, a Distributed Video Coding architecture based on CNN (Convolutional Neural Networks), is proposed for both the encoder and decoder of HEVC. The goal is to simplify the complexity of encoding process and enhance image quality during post-processing at the decoder. In the encoder, optimization is applied to the SVM-CNN CU/PU algorithm by replacing the original interpolation filter with a bilinear interpolation filter to reduce computational load and save encoding time. However, simplifying fractional point estimation leads to image distortion. Hence, CNN is utilized for post-processing to improve the image, resulting in a reduction of BDBR% to 0.43% and an increase in TS% to 74.42%. In the decoder, DenseNet/DAE three-channel CNN models are introduced to enhance decoded images, achieving a decrease in BDBR% to -5.96%. In Chapter Four, we explore how to adjust the thresholds in the encoding algorithm to optimize the time-saving rate under the constraint of image quality improvement. By proportionally adjusting the thresholds for discriminative features such as SAD and RDO, we obtain spatial distributions for ΦSAD and ΦRDO concerning BDBR% and TS%. For more accurate predictions, experiments are conducted around BDBR% = 6.0%, resulting in relational equations between ΦSAD, ΦRDO, ZBDBR, and ZTS. We calculate the predicted optimal time-saving rate based on these equations. Finally, predictions are made based on the relationship curves between BDBR% and TS%. The results show an acceptable margin of error between predictions and experimental outcomes. Therefore, adjusting thresholds to optimize encoding calculations and predict BDBR% and TS% performance can further enhance overall efficiency in the future. |