基於多類型監督學習並使用新型運動向量場之分割路徑預測模型應用於 VVC 畫面間快速編碼

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：12

、訪客IP：3.136.22.241

姓名

張倍銘(Pei-Ming Chang) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於多類型監督學習並使用新型運動向量場之分割路徑預測模型應用於 VVC 畫面間快速編碼
(Multi-Type Supervised Learning Model-Based Prediction of Partition Path for VC Fast Inter Coding with a Novel Motion Vector Field)

相關論文

★ 10Gb/s MM XFP光收發模組設計與實現	★ 資訊產品自動化測試之研究
★ 高電流密度鰭式氮化鎵高電子遷移率電晶體研究	★ 電子郵件及壓縮檔案解碼之研究
★ 渦輪碼在光學記錄系統上之應用	★ 離散餘弦轉換硬體架構之研究
★ 動態影像之錯誤隱藏研究	★ 即時性無失真壓縮編碼之研究
★ 類神經網路在手寫數字辨識之研究	★ 事後機率演算法則在資料儲存系統之研究
★ 紅外線傳輸協定及通道之研究	★ 低密度同位元檢查碼在數位資料儲存系統之研究
★ 一種新型的JPEG2000竄改偵測與還原技術	★ 即時性無失真壓縮之研究
★ 混合快速模式決策演算法之研究	★ 光學記錄MEPR2通道系統之時序恢復探討與研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-12-5以後開放)

摘要(中)

聯合影像探索團隊 (Joint Video Exploration Team, JVET) 於2020年7月通過第一版H.266/VVC影像壓縮標準，在影像資料的壓縮率比上一代標準H.265/HEVC提升了大約兩倍，即保持相同影像品質的情況下，碼率 (bit rate)節省了接近50%，但代價是複雜度的急劇上升，造成編碼時間相比H.265/HEVC多了六到十倍不等。因此，如何降低編碼運算時間成為標準普及化的首要任務。
在VVC規格中提出了許多新技術，而其中一項名為QTMT (Quadtree with Nested Multi-Type Tree) 的區塊分割結構佔據了編碼時間97%以上，這是因為HEVC中只使用QT結構進行CU (Coding Unit) 的區塊劃分，而VVC則多了MTT區塊劃分，其中包括水平和垂直方向的BT (Binary Tree) 與TT (Ternary Tree)。這種新的CU區塊分割結構使每個CU有六種可能的分割模式，以此為基準計算大量的碼率失真代價函數 (Rate-Distortion cost, RD Cost)，就會導致編碼時間大幅度增加；所以本文提出三種基於監督學習模型的分割演算法進行快速決策，並且設計了一種新型的運動向量場 (Motion Vector Field, MVF) 來加強畫面間編碼中的運動估計，而運動估計後得到的結果會作為特徵輸入到各模型中，最後將三種模型串接在一起來獲得最大的編碼效能。
本文所有的研究總共有四項主要貢獻，首先是我們重新定義了一個新型的運動向量場，實驗結果表明此新型MVF能有效地降低BDBR (Bjontegaard Delta Bit Rate)，甚至能夠取代VVC規格中的Affine Merge Mode。第二是我們開發了一種基於有向無環圖支援向量機 (Directed Acyclic Graph-Support Vector Machine, DAG-SVM) 的演算法應用於VVC分割預測，能夠在幾乎不影響編碼性能的情況下，透過預測CU分群的方式削減運算時間。第三是利用隨機森林回歸 (Random Forest Regression, RFR) 易於處理高維度數據的特性作為整體分割預測架構的末端，能夠將卷積神經網路 (Convolutional Neural Network, CNN) 輸出的複雜數據再次進行處理，進一步得到更好的效能。最後一項貢獻則是在模型中規劃了閾值的選擇方案，使編碼複雜度與效率之間的權衡變為可調性。完整架構的實驗結果與原始VVC相比，在VVC測試軟體VTM-10.0使用RAGOP32的設置下，選擇 (Thm = 0.125, Thd = 8) 的閾值方案，BDBR僅僅增加了1.31%，編碼時間卻能減少將近50%，明顯優於其他最先進的解決方案。而選擇 (Thm = 0.2, Thd = 16) 的閾值方案下，BDBR也只增加了2.74%，編碼時間得以減少70%，大大增加了VVC即時應用的可能性。

摘要(英)

In July 2020, the Joint Video Exploration Team (JVET) approved the first version of the H.266/VVC video compression standard. Compared to the previous standard—H.265/HEVC, VVC achieves approximately double of compression efficiency, saving about 50% in bit rate while maintaining the same video quality. However, this comes at the cost of a sharp increase in encoding complexity, resulting in encoding times six to ten times longer than H.265/HEVC. Therefore, reducing encoding time has become a primary target for the widespread adoption of this standard.
The VVC specification introduces several new technologies, one of which is the QTMT (Quadtree with Nested Multi-Type Tree) block partitioning structure, which accounts for over 97% of encoding time. This is because, unlike HEVC, which only uses QT structure for CU (Coding Unit) block partitioning, VVC added MTT partitioning, including horizontal and vertical BT (Binary Tree) and TT (Ternary Tree) splits. This new CU block partitioning structure results in six possible partitioning modes per CU, leading to extensive Rate-Distortion cost (RD Cost) calculations, which significantly increase encoding time. Hence, We propose three partitioning algorithms based on supervised learning models to facilitate faster decisions. Additionally, a novel Motion Vector Field (MVF) is designed to enhance motion estimation in inter prediction, and the results of motion estimation are used as feature inputs to the models. Finally, the three models are combined to achieve maximum encoding efficiency.
In this paper, we contribute four major innovations to VVC. First, we redefine a novel MVF. Experiments show that this new MVF effectively decreases the BDBR (Bjontegaard Delta Bit Rate), even potentially replacing the Affine Merge Mode in VVC specification. Second, we develop a Directed Acyclic Graph-Support Vector Machine (DAG-SVM) algorithm for VVC partition prediction, which reduces computation time by grouping CU into six classes with minimal impact on encoding performance. Third, we use the high-dimensional data processing capability of Random Forest Regression (RFR) as the final component of the partition prediction structure, efficiently refining the complex data output from the Convolutional Neural Network (CNN) for further improved performance. The final contribution is the design of threshold selection schemes in each model, making the trade-off between encoding complexity and efficiency adjustable.
Experiments of the entire prediction structure, compared to the original VVC, show that under the RAGOP32 configuration using VVC test software VTM-10.0 and with thresholds (Thm = 0.125, Thd = 8), the BDBR increase is only 1.31%, while encoding time is reduced by nearly 50%, outperforming other state-of-the-art solutions. With threshold settings of (Thm = 0.2, Thd = 16), the BDBR increase is just 2.74%, and encoding time is reduced by almost 70%, greatly enhancing the potential for real-time VVC applications.

關鍵字(中)

★ 多功能影像編碼
★ 運動向量場
★ 機器學習
★ 深度學習
★ 畫面間預測加速
★ 多元分類

關鍵字(英)

論文目次

中文摘要 i
英文摘要 iii
誌謝 v
圖目錄 x
表目錄 xiii
第一章緒論 1
1.1 研究動機及目的 1
1.2 論文架構 2
1.3 多功能影像編碼 (VVC) 簡介 5
1.4 VVC視訊壓縮編碼架構介紹 6
1.4.1 編碼單元 (Coding Unit, CU) 8
1.4.2 碼率失真代價函數 (Rate-Distortion Cost) 12
1.4.3 量化參數 (Quantization Parameter, QP) 14
1.5 VVC運動估計 (Motion Estimation, ME) 介紹 16
1.5.1 運動估計 (ME) 的基本原理 16
1.5.2 運動向量預測 (Motion Vecotr Prediction, MVP) 17
1.6 VVC畫面間預測 (Inter Prediction) 介紹 20
1.6.1 雙向加權預測 (Bi-Prediction with CU-Level Weight, BCW) 21
1.6.2 三角劃分模式 (Triangle Partition Mode, TPM) 22
1.6.3 畫面內畫面間聯合預測 (Combined Intra and Inter Prediction, CIIP) 24
1.7 支援向量機、卷積神經網路與隨機森林回歸介紹 25
1.7.1 支援向量機 (Support Vector Machine, SVM) 25
1.7.2 卷積神經網路 (Convolutional Neural Network, CNN) 30
1.7.3 隨機森林回歸 (Random Forest Regression, RFR) 36
第二章相關文獻回顧 39
2.1 布里斯托視覺研究所深度影像壓縮 (Bristol Vision Institute-Deep Video
Compression, BVI-DVC) 訓練資料集之回顧 39
2.1.1 BVI-DVC訓練資料集之產生 40
2.1.2 編碼器設置、常用訓練資料集與測試用CNN模型介紹 44
2.1.3 BVI-DVC訓練資料集之實驗結果與分析 49
2.2 有向無環圖支援向量機 (Directed Acyclic Graph-Support Vector Machine, DAG-
SVM) 應用於多元分類之回顧 55
2.2.1 DAG-SVM決策演算法 56
2.2.2 DAG-SVM之實驗結果與分析 60
2.3 多尺寸運動向量場卷積神經網路 (Multi-Scale Motion Vecotr Field CNN, MS-
MVF-CNN) 快速分割路徑預測模型之回顧 61
2.3.1 QTMT分割路徑之新表示圖 62
2.3.2 基於CNN之分割路徑預測模型及快速演算法 65
2.3.3 MS-MVF-CNN之實驗結果與分析 71
第三章結合DAG-SVM/MS-MVF-CNN/RFR並使用新型運動向量場之分割路徑預測
模型應用於VVC畫面間快速編碼 77
3.1 一種新型的運動向量場設計 (A Novel Motion Vector Field, MVF) 80
3.1.1 回顧自適應運動向量解析度 (Adaptive Motion Vector Resolution, AMVR) 設
計及改進 80
3.1.2 新型MVF設計及參數計算 87
3.1.3 新型MVF之實驗結果與分析 92
3.1.4 新型MVF之性能分析與比較 100
3.2 有向無環圖支援向量機 (Directed Acyclic Graph-Support Vector Machine, DAG-
SVM) 應用於VVC快速分割決策演算法 102
3.2.1 DAG-SVM架構設計及核函數 103
3.2.2 DAG-SVM之特徵及準確率分析 108
3.2.3 DAG-SVM之實驗結果與分析 115
3.3 結合DAG-SVM/MS-MVF-CNN應用於VVC快速分割決策演算法 119
3.3.1 DAG-SVM/MS-MVF-CNN結合方式討論 119
3.3.2 DAG-SVM/MS-MVF-CNN之混淆矩陣及準確率分析 125
3.3.3 DAG-SVM/MS-MVF-CNN之實驗結果與分析 133
3.4 隨機森林回歸 (Random Forest Regression, RFR) 應用於VVC快速分割決策演算
法 136
3.4.1 RFR演算法介紹 137
3.4.2 RFR之參數設定及準確率分析 140
3.4.3 RFR之實驗結果與分析 143
3.5 結合DAG-SVM/MS-MVF-CNN/RFR應用於VVC快速分割決策演算法 147
3.5.1 分析閾值設定對模型性能之影響 147
3.5.2 DAG-SVM/MS-MVF-CNN/RFR之性能分析與比較 151
第四章結論與未來展望 156
參考文獻 158

參考文獻

[1] Cisco. Cisco Annual Internet Report (2018–2023) White Paper, 2020.
[2] B. Bross, et al. “Overview of the versatile video coding (vvc) standard and its applications.” IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
[3] Z. Wang, et al. “Adaptive motion vector resolution scheme for enhanced video coding.” In 2016 Data Compression Conference (DCC), pages 101–110, 2016.
[4] L. Li, et al. “An efficient four-parameter affine motion model for video coding.” IEEE Transactions on Circuits and Systems for Video Technology, 28(8):1934–1948, 2018.
[5] A. Alshin, E. Alshina, and T. Lee. “Bi-directional optical flow for improving motion compensation.” In 28th Picture Coding Symposium, pages 422–425, 2010.
[6] Y.W. Huang, et al. “Block partitioning structure in the vvc standard.” IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3818–3833, 2021.
[7] A. Tissier, et al. “Complexity reduction opportunities in the future vvc intra encoder.” In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–6. IEEE, 2019.
[8] Y. Fan, et al. “A fast qtmt partition decision strategy for vvc intra prediction.” IEEE Access, 8:107900–107911, 2020.
[9] J. Cui, et al. “Gradient-based early termination of cu partition in vvc intra coding.” In 2020 Data Compression Conference (DCC), pages 103–112, 2020.
[10] J. Chen, et al. “Fast qtmt partition decision algorithm in vvc intra coding based on variance and gradient.” In 2019 IEEE Visual Communications and Image Processing (VCIP), pages 1–4, 2019.
[11] M. Lei, et al. “Look-ahead prediction based coding unit size pruning for vvc intra coding.” In 2019 IEEE International Conference on Image Processing (ICIP), pages 4120–4124, 2019.
[12] M. Saldanha, et al. “Fast partitioning decision scheme for versatile video coding intra-frame prediction.” In 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5, 2020.
[13] T. Fu, et al. “Fast cu partitioning algorithm for h.266/vvc intra-frame coding.” In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 55–60, 2019.
[14] F. Galpin, et al. “Cnn-based driving of block partitioning for intra slices encoding.” In 2019 Data Compression Conference (DCC), pages 162–171. IEEE, 2019.
[15] A Tissier, et al. “Machine learning based efficient qt-mtt partitioning for vvc inter coding.” In 2022 IEEE International Conference on Image Processing (ICIP), pages 1401–1405. IEEE, 2022.
[16] S. Wu, J. Shi, and Z. Chen. “Hg-FCN: Hierarchical grid fully convolutional network for fast vvc intra coding.” IEEE Transactions on Circuits and Systems for Video Technology, 32(8):5638–5649, 2022.
[17] A. Feng, et al. “Partition map prediction for fast block partitioning in vvc intra-frame coding.” IEEE Transactions on Image Processing, 32:2237–2251, 2023.
[18] M. Saldanha, et al. “Configurable fast block partitioning for vvc intra coding using light gradient boosting machine.” IEEE Transactions on Circuits and Systems for Video Technology, 32(6):3947–3960, 2021.
[19] T. Amestoy, et al. “Tunable vvc frame partitioning based on lightweight machine learning.” IEEE Transactions on Image Processing, 29:1313–1328, 2020.
[20] G. Kulupana, V.P. Kumar M, and S. Blasi. “Fast versatile video coding using specialised decision trees.” In 2021 Picture Coding Symposium (PCS), pages 1–5, 2021.
[21] Z. Pan, et al. “A cnn-based fast inter coding method for vvc.” IEEE Signal Processing Letters, 28:1260–1264, 2021.
[22] W. Yeo and B.G. Kim. “CNN-based fast split mode decision algorithm for versatile video coding (vvc) inter prediction.” Journal of Multimedia Information System, 8(3):147–158, 2021.
[23] Y. Liu, et al. “Lightweight cnn-based vvc inter partitioning acceleration.” In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pages 1–5. IEEE, 2022.
[24] A. Tissier, et al. “Machine learning based efficient qt-mtt partitioning scheme for vvc intra encoders.” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
[25] A Wieckowski, et al. “Fast partitioning decision strategies for the upcoming versatile video coding (vvc) standard.” In 2019 IEEE International Conference on Image Processing (ICIP), pages 4130–4134. IEEE, 2019.
[26] P.Chen and S. Liu. “An improved dag-svm for multi-class classification.” In 2009 Fifth International Conference on Natural Computation, pages 460–462. ICNC, 2009.
[27] Y. Liu, et al. “CNN-based prediction of partition path for vvc fast inter partitioning using motion fields.” ArXiv abs/2310.13838 (2023): n. pag.
[28] D. Ma, F. Zhang, and D.R. Bull. “Bvi-dvc: A training database for deep video compression.” IEEE Transactions on Multimedia, 24:3847–3858, 2021.
[29] Y. Wang, S. Inguva, and B. Adsumilli. “Youtube ugc dataset for video compression research.” In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–5. IEEE, 2019.
[30] Y. Ye J. Chen and S. Kim. “Algorithm description for versatile video coding and test model 10 (vtm 10).” Technical Report document JVETS2002, JVET, 2020.
[31] G.J. Sullivan and T. Wiegand. “Rate-distortion optimization for video compression.” IEEE Signal Processing Magazine, 15(6):74–90, 1998.
[32] H. Liu, et al. “Adaptive motion vector resolution for affine-inter mode coding.” In 2019 Picture Coding Symposium (PCS), pages 1–4. IEEE, 2019.
[33] J. Boyce, et al. “Jvet common test conditions and software reference configurations.” Technical Report document JVET-J1010, JVET, 07 2018.
[34] G. Bjontegaard. “Calculation of average PSNR differences between rdcurves.” VCEG-M33, 2001.
[35] Hafshejani, Sajad Fathi and Zahra Moaberfard. “A new trigonometric kernel function for support vector machine.” Iran Journal of Computer Science 6 (2022): 137-145.
[36] Shang-Jung Hsieh. “Fast qtmt partition algorithm for vvc inter prediction with hierarchical feature fusion model.” National Central University, Master Thesis, Dec 2024.
[37] Kingma, Diederik P. and Jimmy Ba. “Adam: A Method for Stochastic Optimization.” CoRR abs/1412.6980 (2014): n. pag.
[38] Y Liu, et al. “Statistical analysis of inter coding in vvc test model (vtm).” In 2022 IEEE International Conference on Image Processing (ICIP), pages 3456–3459, 2022.

指導教授

林銀議(Yin-Yi Lin)

審核日期

2024-12-12

推文