摘要: | 在當今網路和科技快速發展的時代,大眾對高解析度影像品質的要求日益提高。然而,高解析度影像所帶來的大量資料需要更高效的壓縮技術來處理。H.266/VVC 引入了多項先進技術,例如方形與矩形編碼樹單元(Coding Unit, CU)的多類型劃分,以及碼率失真最佳化(Rate-Distortion Optimization, RDO),這些技術在提升壓縮效率的同時,也顯著增加了編碼計算的複雜度。本論文結合傳統特徵法以及機器學習和深度學習技術,先使用支持向量機、卷積神經網路和隨機森林分類器,應用於VVC的編碼單元的劃分。我們的兩階段VVC 首先在第一階段使用支持向量機及和卷積神經網路對方形編碼單元進行劃分,並在第二階段使用隨機森林分類器進一步處理矩形編碼單元。然而,研究發現卷積神經網路在預測劃分模式時,存在著部分MT劃分模式被遺漏的問題,導致編碼性能下降。為解決此問題,我們提出了Sobel Operator 的判別式,用以偵測影像的紋理方向並輔助劃分決策。實驗結果顯示,第一階段使用Sobel Operator後,BDBR僅上升0.42%,但編碼時間節省達26.48%;兩階段VVC 平均BDBR僅上升0.95%,但編碼時間節省達61.1%。與原本兩階段VVC 相比,我們的改進有效提升了編碼性能,並且僅略微增加一點編碼時間。接著我們進一步優化演算法,將支持向量機的決策值當作可調式閥值的判別基準,透過可調式閥值的設計,我們能有效減少編碼單元進入卷積神經網路的次數,從而提前終止編碼單元的劃分。可調式閥值的設計允許使用者根據不同的應用需求,在影像品質與編碼時間之間靈活權衡,從而實現高效的壓縮性能表現。;In today′s era of rapid advancements in networks and technology, the demand for high-resolution image quality continues to grow. However, the massive data generated by high-resolution images requires more efficient compression technologies to handle. H.266/VVC introduces numerous advanced techniques, such as multi-type division of square and rectangular Coding Units (CUs) and Rate-Distortion Optimization (RDO). While these innovations improve compression efficiency, they significantly increase the computational complexity of encoding. This paper combines traditional feature-based methods with machine learning and deep learning techniques, utilizing Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Random Forest Classifiers in the CU division of VVC. Our two-stage VVC first employs SVM and CNN to divide square CUs in the first stage and then uses Random Forest Classifiers to further process rectangular CUs in the second stage. However, studies have revealed that CNN has issues with missing certain MT division patterns when predicting division modes, leading to a decline in encoding performance. To address this issue, we propose a Sobel Operator-based criterion to detect texture directions in images and assist in division decisions. Experimental results show that incorporating the Sobel Operator in the first stage leads to a BDBR increase of only 0.42%, while saving 26.48% of encoding time. The two-stage VVC achieves an average BDBR increase of just 0.95% while saving 61.1% of encoding time. Compared to the original two-stage VVC, our improvements significantly enhance encoding performance with only a slight increase in computational time. Furthermore, we optimize the algorithm by using the decision values of the SVM as the basis for adjustable thresholds. Through the design of adjustable thresholds, we effectively reduce the number of CUs entering the CNN, thereby terminating the CU division process earlier. The adjustable thresholds allow users to flexibly balance image quality and encoding time according to different application needs, achieving highly efficient compression performance. |