摘要: | 隨著影音娛樂蓬勃發展,不只電視、電影甚至當紅的影音串流平台 Youtube、Twitch 皆追求越來越高的畫質,近期直播更是流行,不只有追求畫質更要在即時傳輸達到一定的水準,硬體方面各家電視廠商螢幕也是越做越大,而人們為了有效的壓縮高解析度影像的巨大資料量,HEVC(High Efficiency Video Coding)運用的許多方式有效的降低位元傳輸,在此篇論文中我們在HEVC 畫面間預測的編碼端應用了 SVM(Support Vector Machine)模型,對編碼單元深度做分類,利用畫面間預測的編碼單元之運動向量變異數資訊、合併模式的 CBF 資訊,以及相鄰區塊的深度資訊作為訓練 SVM 模型的特徵(Features)將一個 CTU 做分類,分類為 Subgroup0、1、2、3 共四種類別,其中 Subgroup0包含 CTU 深度 0,Subgroup1 包含深度 0、1,Subgroup2 包含深度 0、1、2 而Subgroup3 包含深度 0、1、2、3 最後會在經過 RDO 過程選出 CTU 最佳深度,此演算法可以在編碼端節省 23.5%的編碼時間,但增加了 0.07%的 BDBR,所以我們決定使用後處理技術,在解碼端將節省編碼時間所造成的編碼效能損失補償回來,我們運用日漸流行的卷積神經網路CNN(Convolutional Neural Network)於 HEVC 後處理,來提高影像品質。在實驗裡結合了消息理論中提及的側面消息概念,越多的側面消息可以降低越多的未定量,所以在 CNN 模型中除了輸入經過壓縮過後的失真影像也會加入編碼端 SVM 模型所使用的特徵做為第二輸入,幫助 CNN 模型訓練的更精準,最後更會加入編碼中量化(Quantization)所造成的誤差做為 CNN 模型的第三輸入,於是在我們結合整體架構後,最終在 HEVC 畫面間預測與參考程式 HM16.0 相比,可以達到 BDBR 減少6.59%,在 BDPSNR 增加0.237dB。
Network)於 HEVC 後處理,來提高影像品質。在實驗裡結合了消息理論中提及 的側面消息概念,越多的側面消息可以降低越多的未定量,所以在 CNN 模 型中除了輸入經過壓縮過後的失真影像也會加入編碼端 SVM 模型所使用的特 徵做為第二輸入,幫助 CNN 模型訓練的更精準,最後更會加入編碼中量化(Quantization)所造成的誤差做為 CNN 模型的第三輸入,於是在我們結合整體架構後,最終在 HEVC 畫面間預測與參考程式 HM16.0 相比,可以達到 BDBR 減少-6.59%,在 BDPSNR 增加 0.237dB。;With the development of video and audio entertainment, not only TVs, movies, but also popular video streaming platforms Youtube and Twitch are pursuing higher and higher image quality. Recently, live broadcast is becoming more popular. In terms of hardware, the screens of TV are getting larger and larger. In order to effectively compress the huge data volume of high-resolution images, HEVC (High Efficiency Video Coding) uses many methods to effectively reduce bit transmission. In this paper, we apply the SVM (Support Vector Machine) model to the encoding side of HEVC inter prediction classify the depth of the coding unit, and use the motion vector variation information of the inter prediction coding unit and the CBF information of the merge mode. And the depth information of the adjacent blocks is used as the features of the training SVM model to classify a CTU, which is divided into four categories: Subgroup0, 1, 2, and 3. Subgroup0 contains CTU depth 0, and Subgroup1 contains depth 0, 1. , Subgroup2 contains depth 0, 1, 2 and Subgroup3 contains depth 0, 1, 2, 3. Finally, the best depth of CTU will be selected after the RDO process. This algorithm can save 23.5% of the encoding time at the encoder, but it increases by 0.07 % BDBR, so we decided to use post-processing technology to compensate for the coding performance caused by saving coding time at the decoder. We use the convolutional neural network CNN (Convolutional Neural Network) model in HEVC post-processing to improve Image quality. In the experiment, the side information concept mentioned in the information theory is combined. The more side information can reduce the more uncertainty, so in addition to the input of the distorted image after compression, the features used in the SVM model at the encoder will be added as the second input. It can help CNN model training more accurately. Finally, the error caused by quantization in the encoding will be added as the third input of the CNN model. So after we combined the overall architecture, compared with the reference program HM16.0, our algorithm achieves up to 6.59% BDBR reduction and 0.237dB BDPSNR increase. |