摘要: | 在智慧應用快速發展的背景下,傳統依賴客服中心人員 全程監控的模式已難以應對不斷增長的需求,逐漸被機器自主判 斷並採取措施的新模式所取代。然而,現今基於人眼視覺優化的 影像壓縮標準(如 HEVC、VVC)可能無法滿足機器視覺的需求。 由於機器視覺任務和人眼視覺需求存在差異,因此需要開發一種 針對機器視覺的高效壓縮標準,這樣的需求推動了機器視覺影像 編碼(Video Coding for Machines, VCM)的發展,以便更好地應用 在機器視覺開發。同時,VCM 將評估方法由傳統的峰值信噪比 (PSNR)轉變為基於機器視覺的任務準確率。其中,機器視覺特 徵編碼(Feature Coding for Machines, FCM)通過壓縮特徵圖資料
i
代替原始影像,以提高可壓縮性並期望達到更高的機器視覺任務 準確率。FCM 的一項挑戰是壓縮失真可能導致機器視覺模型的準 確率下降,這也是本論文的研究重點。在 2023 年 4 月的第 142 次 MPEG 會議上,針對 FCM 提出了一項特徵壓縮測試模型(Feature Compression Test Model, FCTM)並發布了 Call for Proposal,邀請 各界提出創新方案。本研究在此背景下提出了一種結合卷積神經 網路(CNN)架構的多通道後處理補償器系統,旨在恢復因 FCM 壓縮系統而導致的準確率下降。該系統經過神經網路訓練後進行 測試,通過補償失真資料來提升機器視覺任務的準確率。初步結 果顯示,單通道後處理補償器已能顯著提升壓縮後的機器視覺任 務準確率,相比 FCTM v1.0.0 可以提昇 BDMOTA 至 2.94%。並且, 為了進一步優化效果,我們分別增加了第二通道和第三通道的額 外特徵以提升後處理補償器對壓縮失真的補償效果。實驗結果表 明,相較於 FCTM v1.0.0,本文提出的多通道後處理補償器系統使 整體平均 BDMOTA 提升最高達 4.7%,顯著改善了經過 FCM 壓縮 後損失的機器視覺任務準確率。;In the context of the rapid development of intelligent applications, the traditional approach of relying on customer service center staff for full-time monitoring has become inadequate to meet the growing demand. This has gradually been replaced by a new model in which machines autonomously make judgments and take actions. However, current image compression standards optimized for human visual perception, such as HEVC and VVC, may not suffice for the needs of machine vision. Since the requirements of machine vision tasks differ from those of human visual perception, it is essential to develop an efficient compression standard tailored specifically for machine vision. This need has driven the development of Video Coding for Machines (VCM), which is better suited for machine vision applications. Moreover, VCM shifts its evaluation metrics from traditional peak signal-to-noise ratio (PSNR) to machine vision task accuracy. Feature Coding for Machines (FCM) compresses feature map data instead of raw images to improve compressibility while aiming for higher accuracy in machine vision tasks. One of the challenges in FCM lies in the compression distortion, which can result in reduced accuracy for machine vision models. Addressing this issue is the primary focus of this research.
During the 142nd MPEG meeting in April 2023, a Feature Compression Test Model (FCTM) for FCM was proposed, and a Call for Proposals was issued, inviting innovative solutions from the community. Against this backdrop, this study proposes a multi-channel post-processing compensator system based on a convolutional neural network (CNN) architecture. The system is designed to mitigate the accuracy degradation caused by FCM compression by compensating for distorted data. After being trained with neural networks, the proposed system is tested to enhance machine vision task accuracy by addressing compression artifacts.
Preliminary results show that even a single-channel post-processing compensator significantly improves the accuracy of machine vision tasks after compression, increasing BDMOTA by up to 2.94% compared to FCTM v1.0.0. To further enhance performance, additional features were incorporated into second and third channels, improving the compensator’s ability to mitigate compression distortion. Experimental results demonstrate that, compared to FCTM v1.0.0, the proposed multi-channel post-processing compensator system achieves an overall average BDMOTA improvement of up to 4.7%, effectively restoring the accuracy loss in ma- chine vision tasks caused by FCM compression. |