一種以 CNN 為基礎之後處理器應用於 FCM 性能提升之研究;CNN-Based Post-Processing for Performance Improvement in Feature Coding for Machines

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/96267

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/96267

題名:	一種以 CNN 為基礎之後處理器應用於 FCM 性能提升之研究;CNN-Based Post-Processing for Performance Improvement in Feature Coding for Machines
作者:	陳柏聿;Chen, Po-Yu
貢獻者:	通訊工程學系
關鍵詞:	機器視覺編碼;機器視覺特徵編碼;多通道後處理補償器系統;深度學習;殘差稠密神經網路;Video Coding for Machines;Feature Coding for Machines;Multi-Channel Post-Processing Compensation System;Deep Learning;Residual Dense Neural Network
日期:	2024-12-12
上傳時間:	2025-04-09 17:30:56 (UTC+8)
出版者:	國立中央大學
摘要:	在智慧應用快速發展的背景下,傳統依賴客服中心人員全程監控的模式已難以應對不斷增長的需求,逐漸被機器自主判斷並採取措施的新模式所取代。然而,現今基於人眼視覺優化的影像壓縮標準(如 HEVC、VVC)可能無法滿足機器視覺的需求。由於機器視覺任務和人眼視覺需求存在差異,因此需要開發一種針對機器視覺的高效壓縮標準,這樣的需求推動了機器視覺影像編碼(Video Coding for Machines, VCM)的發展,以便更好地應用在機器視覺開發。同時,VCM 將評估方法由傳統的峰值信噪比 (PSNR)轉變為基於機器視覺的任務準確率。其中,機器視覺特徵編碼(Feature Coding for Machines, FCM)通過壓縮特徵圖資料 i 代替原始影像,以提高可壓縮性並期望達到更高的機器視覺任務準確率。FCM 的一項挑戰是壓縮失真可能導致機器視覺模型的準確率下降,這也是本論文的研究重點。在 2023 年 4 月的第 142 次 MPEG 會議上,針對 FCM 提出了一項特徵壓縮測試模型(Feature Compression Test Model, FCTM)並發布了 Call for Proposal,邀請各界提出創新方案。本研究在此背景下提出了一種結合卷積神經網路(CNN)架構的多通道後處理補償器系統,旨在恢復因 FCM 壓縮系統而導致的準確率下降。該系統經過神經網路訓練後進行測試,通過補償失真資料來提升機器視覺任務的準確率。初步結果顯示,單通道後處理補償器已能顯著提升壓縮後的機器視覺任務準確率,相比 FCTM v1.0.0 可以提昇 BDMOTA 至 2.94%。並且, 為了進一步優化效果,我們分別增加了第二通道和第三通道的額外特徵以提升後處理補償器對壓縮失真的補償效果。實驗結果表明,相較於 FCTM v1.0.0,本文提出的多通道後處理補償器系統使整體平均 BDMOTA 提升最高達 4.7%,顯著改善了經過 FCM 壓縮後損失的機器視覺任務準確率。;In the context of the rapid development of intelligent applications, the traditional approach of relying on customer service center staff for full-time monitoring has become inadequate to meet the growing demand. This has gradually been replaced by a new model in which machines autonomously make judgments and take actions. However, current image compression standards optimized for human visual perception, such as HEVC and VVC, may not suffice for the needs of machine vision. Since the requirements of machine vision tasks differ from those of human visual perception, it is essential to develop an efficient compression standard tailored specifically for machine vision. This need has driven the development of Video Coding for Machines (VCM), which is better suited for machine vision applications. Moreover, VCM shifts its evaluation metrics from traditional peak signal-to-noise ratio (PSNR) to machine vision task accuracy. Feature Coding for Machines (FCM) compresses feature map data instead of raw images to improve compressibility while aiming for higher accuracy in machine vision tasks. One of the challenges in FCM lies in the compression distortion, which can result in reduced accuracy for machine vision models. Addressing this issue is the primary focus of this research. During the 142nd MPEG meeting in April 2023, a Feature Compression Test Model (FCTM) for FCM was proposed, and a Call for Proposals was issued, inviting innovative solutions from the community. Against this backdrop, this study proposes a multi-channel post-processing compensator system based on a convolutional neural network (CNN) architecture. The system is designed to mitigate the accuracy degradation caused by FCM compression by compensating for distorted data. After being trained with neural networks, the proposed system is tested to enhance machine vision task accuracy by addressing compression artifacts. Preliminary results show that even a single-channel post-processing compensator significantly improves the accuracy of machine vision tasks after compression, increasing BDMOTA by up to 2.94% compared to FCTM v1.0.0. To further enhance performance, additional features were incorporated into second and third channels, improving the compensator’s ability to mitigate compression distortion. Experimental results demonstrate that, compared to FCTM v1.0.0, the proposed multi-channel post-processing compensator system achieves an overall average BDMOTA improvement of up to 4.7%, effectively restoring the accuracy loss in ma- chine vision tasks caused by FCM compression.
顯示於類別:	[通訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	105	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....