摘要(英) |
In July 2020, the Joint Video Exploration Team (JVET) approved the first version of the H.266/VVC video compression standard. Compared to the previous standard—H.265/HEVC, VVC achieves approximately double of compression efficiency, saving about 50% in bit rate while maintaining the same video quality. However, this comes at the cost of a sharp increase in encoding complexity, resulting in encoding times six to ten times longer than H.265/HEVC. Therefore, reducing encoding time has become a primary target for the widespread adoption of this standard.
The VVC specification introduces several new technologies, one of which is the QTMT (Quadtree with Nested Multi-Type Tree) block partitioning structure, which accounts for over 97% of encoding time. This is because, unlike HEVC, which only uses QT structure for CU (Coding Unit) block partitioning, VVC added MTT partitioning, including horizontal and vertical BT (Binary Tree) and TT (Ternary Tree) splits. This new CU block partitioning structure results in six possible partitioning modes per CU, leading to extensive Rate-Distortion cost (RD Cost) calculations, which significantly increase encoding time. Hence, We propose three partitioning algorithms based on supervised learning models to facilitate faster decisions. Additionally, a novel Motion Vector Field (MVF) is designed to enhance motion estimation in inter prediction, and the results of motion estimation are used as feature inputs to the models. Finally, the three models are combined to achieve maximum encoding efficiency.
In this paper, we contribute four major innovations to VVC. First, we redefine a novel MVF. Experiments show that this new MVF effectively decreases the BDBR (Bjontegaard Delta Bit Rate), even potentially replacing the Affine Merge Mode in VVC specification. Second, we develop a Directed Acyclic Graph-Support Vector Machine (DAG-SVM) algorithm for VVC partition prediction, which reduces computation time by grouping CU into six classes with minimal impact on encoding performance. Third, we use the high-dimensional data processing capability of Random Forest Regression (RFR) as the final component of the partition prediction structure, efficiently refining the complex data output from the Convolutional Neural Network (CNN) for further improved performance. The final contribution is the design of threshold selection schemes in each model, making the trade-off between encoding complexity and efficiency adjustable.
Experiments of the entire prediction structure, compared to the original VVC, show that under the RAGOP32 configuration using VVC test software VTM-10.0 and with thresholds (Thm = 0.125, Thd = 8), the BDBR increase is only 1.31%, while encoding time is reduced by nearly 50%, outperforming other state-of-the-art solutions. With threshold settings of (Thm = 0.2, Thd = 16), the BDBR increase is just 2.74%, and encoding time is reduced by almost 70%, greatly enhancing the potential for real-time VVC applications. |
論文目次 |
中文摘要 i
英文摘要 iii
誌謝 v
圖目錄 x
表目錄 xiii
第一章 緒論 1
1.1 研究動機及目的 1
1.2 論文架構 2
1.3 多功能影像編碼 (VVC) 簡介 5
1.4 VVC視訊壓縮編碼架構介紹 6
1.4.1 編碼單元 (Coding Unit, CU) 8
1.4.2 碼率失真代價函數 (Rate-Distortion Cost) 12
1.4.3 量化參數 (Quantization Parameter, QP) 14
1.5 VVC運動估計 (Motion Estimation, ME) 介紹 16
1.5.1 運動估計 (ME) 的基本原理 16
1.5.2 運動向量預測 (Motion Vecotr Prediction, MVP) 17
1.6 VVC畫面間預測 (Inter Prediction) 介紹 20
1.6.1 雙向加權預測 (Bi-Prediction with CU-Level Weight, BCW) 21
1.6.2 三角劃分模式 (Triangle Partition Mode, TPM) 22
1.6.3 畫面內畫面間聯合預測 (Combined Intra and Inter Prediction, CIIP) 24
1.7 支援向量機、卷積神經網路與隨機森林回歸介紹 25
1.7.1 支援向量機 (Support Vector Machine, SVM) 25
1.7.2 卷積神經網路 (Convolutional Neural Network, CNN) 30
1.7.3 隨機森林回歸 (Random Forest Regression, RFR) 36
第二章 相關文獻回顧 39
2.1 布里斯托視覺研究所深度影像壓縮 (Bristol Vision Institute-Deep Video
Compression, BVI-DVC) 訓練資料集之回顧 39
2.1.1 BVI-DVC訓練資料集之產生 40
2.1.2 編碼器設置、常用訓練資料集與測試用CNN模型介紹 44
2.1.3 BVI-DVC訓練資料集之實驗結果與分析 49
2.2 有向無環圖支援向量機 (Directed Acyclic Graph-Support Vector Machine, DAG-
SVM) 應用於多元分類之回顧 55
2.2.1 DAG-SVM決策演算法 56
2.2.2 DAG-SVM之實驗結果與分析 60
2.3 多尺寸運動向量場卷積神經網路 (Multi-Scale Motion Vecotr Field CNN, MS-
MVF-CNN) 快速分割路徑預測模型之回顧 61
2.3.1 QTMT分割路徑之新表示圖 62
2.3.2 基於CNN之分割路徑預測模型及快速演算法 65
2.3.3 MS-MVF-CNN之實驗結果與分析 71
第三章 結合DAG-SVM/MS-MVF-CNN/RFR並使用新型運動向量場之分割路徑預測
模型應用於VVC畫面間快速編碼 77
3.1 一種新型的運動向量場設計 (A Novel Motion Vector Field, MVF) 80
3.1.1 回顧自適應運動向量解析度 (Adaptive Motion Vector Resolution, AMVR) 設
計及改進 80
3.1.2 新型MVF設計及參數計算 87
3.1.3 新型MVF之實驗結果與分析 92
3.1.4 新型MVF之性能分析與比較 100
3.2 有向無環圖支援向量機 (Directed Acyclic Graph-Support Vector Machine, DAG-
SVM) 應用於VVC快速分割決策演算法 102
3.2.1 DAG-SVM架構設計及核函數 103
3.2.2 DAG-SVM之特徵及準確率分析 108
3.2.3 DAG-SVM之實驗結果與分析 115
3.3 結合DAG-SVM/MS-MVF-CNN應用於VVC快速分割決策演算法 119
3.3.1 DAG-SVM/MS-MVF-CNN結合方式討論 119
3.3.2 DAG-SVM/MS-MVF-CNN之混淆矩陣及準確率分析 125
3.3.3 DAG-SVM/MS-MVF-CNN之實驗結果與分析 133
3.4 隨機森林回歸 (Random Forest Regression, RFR) 應用於VVC快速分割決策演算
法 136
3.4.1 RFR演算法介紹 137
3.4.2 RFR之參數設定及準確率分析 140
3.4.3 RFR之實驗結果與分析 143
3.5 結合DAG-SVM/MS-MVF-CNN/RFR應用於VVC快速分割決策演算法 147
3.5.1 分析閾值設定對模型性能之影響 147
3.5.2 DAG-SVM/MS-MVF-CNN/RFR之性能分析與比較 151
第四章 結論與未來展望 156
參考文獻 158 |
參考文獻 |
[1] Cisco. Cisco Annual Internet Report (2018–2023) White Paper, 2020.
[2] B. Bross, et al. “Overview of the versatile video coding (vvc) standard and its applications.” IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
[3] Z. Wang, et al. “Adaptive motion vector resolution scheme for enhanced video coding.” In 2016 Data Compression Conference (DCC), pages 101–110, 2016.
[4] L. Li, et al. “An efficient four-parameter affine motion model for video coding.” IEEE Transactions on Circuits and Systems for Video Technology, 28(8):1934–1948, 2018.
[5] A. Alshin, E. Alshina, and T. Lee. “Bi-directional optical flow for improving motion compensation.” In 28th Picture Coding Symposium, pages 422–425, 2010.
[6] Y.W. Huang, et al. “Block partitioning structure in the vvc standard.” IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3818–3833, 2021.
[7] A. Tissier, et al. “Complexity reduction opportunities in the future vvc intra encoder.” In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–6. IEEE, 2019.
[8] Y. Fan, et al. “A fast qtmt partition decision strategy for vvc intra prediction.” IEEE Access, 8:107900–107911, 2020.
[9] J. Cui, et al. “Gradient-based early termination of cu partition in vvc intra coding.” In 2020 Data Compression Conference (DCC), pages 103–112, 2020.
[10] J. Chen, et al. “Fast qtmt partition decision algorithm in vvc intra coding based on variance and gradient.” In 2019 IEEE Visual Communications and Image Processing (VCIP), pages 1–4, 2019.
[11] M. Lei, et al. “Look-ahead prediction based coding unit size pruning for vvc intra coding.” In 2019 IEEE International Conference on Image Processing (ICIP), pages 4120–4124, 2019.
[12] M. Saldanha, et al. “Fast partitioning decision scheme for versatile video coding intra-frame prediction.” In 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5, 2020.
[13] T. Fu, et al. “Fast cu partitioning algorithm for h.266/vvc intra-frame coding.” In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 55–60, 2019.
[14] F. Galpin, et al. “Cnn-based driving of block partitioning for intra slices encoding.” In 2019 Data Compression Conference (DCC), pages 162–171. IEEE, 2019.
[15] A Tissier, et al. “Machine learning based efficient qt-mtt partitioning for vvc inter coding.” In 2022 IEEE International Conference on Image Processing (ICIP), pages 1401–1405. IEEE, 2022.
[16] S. Wu, J. Shi, and Z. Chen. “Hg-FCN: Hierarchical grid fully convolutional network for fast vvc intra coding.” IEEE Transactions on Circuits and Systems for Video Technology, 32(8):5638–5649, 2022.
[17] A. Feng, et al. “Partition map prediction for fast block partitioning in vvc intra-frame coding.” IEEE Transactions on Image Processing, 32:2237–2251, 2023.
[18] M. Saldanha, et al. “Configurable fast block partitioning for vvc intra coding using light gradient boosting machine.” IEEE Transactions on Circuits and Systems for Video Technology, 32(6):3947–3960, 2021.
[19] T. Amestoy, et al. “Tunable vvc frame partitioning based on lightweight machine learning.” IEEE Transactions on Image Processing, 29:1313–1328, 2020.
[20] G. Kulupana, V.P. Kumar M, and S. Blasi. “Fast versatile video coding using specialised decision trees.” In 2021 Picture Coding Symposium (PCS), pages 1–5, 2021.
[21] Z. Pan, et al. “A cnn-based fast inter coding method for vvc.” IEEE Signal Processing Letters, 28:1260–1264, 2021.
[22] W. Yeo and B.G. Kim. “CNN-based fast split mode decision algorithm for versatile video coding (vvc) inter prediction.” Journal of Multimedia Information System, 8(3):147–158, 2021.
[23] Y. Liu, et al. “Lightweight cnn-based vvc inter partitioning acceleration.” In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pages 1–5. IEEE, 2022.
[24] A. Tissier, et al. “Machine learning based efficient qt-mtt partitioning scheme for vvc intra encoders.” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
[25] A Wieckowski, et al. “Fast partitioning decision strategies for the upcoming versatile video coding (vvc) standard.” In 2019 IEEE International Conference on Image Processing (ICIP), pages 4130–4134. IEEE, 2019.
[26] P.Chen and S. Liu. “An improved dag-svm for multi-class classification.” In 2009 Fifth International Conference on Natural Computation, pages 460–462. ICNC, 2009.
[27] Y. Liu, et al. “CNN-based prediction of partition path for vvc fast inter partitioning using motion fields.” ArXiv abs/2310.13838 (2023): n. pag.
[28] D. Ma, F. Zhang, and D.R. Bull. “Bvi-dvc: A training database for deep video compression.” IEEE Transactions on Multimedia, 24:3847–3858, 2021.
[29] Y. Wang, S. Inguva, and B. Adsumilli. “Youtube ugc dataset for video compression research.” In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–5. IEEE, 2019.
[30] Y. Ye J. Chen and S. Kim. “Algorithm description for versatile video coding and test model 10 (vtm 10).” Technical Report document JVETS2002, JVET, 2020.
[31] G.J. Sullivan and T. Wiegand. “Rate-distortion optimization for video compression.” IEEE Signal Processing Magazine, 15(6):74–90, 1998.
[32] H. Liu, et al. “Adaptive motion vector resolution for affine-inter mode coding.” In 2019 Picture Coding Symposium (PCS), pages 1–4. IEEE, 2019.
[33] J. Boyce, et al. “Jvet common test conditions and software reference configurations.” Technical Report document JVET-J1010, JVET, 07 2018.
[34] G. Bjontegaard. “Calculation of average PSNR differences between rdcurves.” VCEG-M33, 2001.
[35] Hafshejani, Sajad Fathi and Zahra Moaberfard. “A new trigonometric kernel function for support vector machine.” Iran Journal of Computer Science 6 (2022): 137-145.
[36] Shang-Jung Hsieh. “Fast qtmt partition algorithm for vvc inter prediction with hierarchical feature fusion model.” National Central University, Master Thesis, Dec 2024.
[37] Kingma, Diederik P. and Jimmy Ba. “Adam: A Method for Stochastic Optimization.” CoRR abs/1412.6980 (2014): n. pag.
[38] Y Liu, et al. “Statistical analysis of inter coding in vvc test model (vtm).” In 2022 IEEE International Conference on Image Processing (ICIP), pages 3456–3459, 2022. |