視覺追蹤(Visual tracking)目標為以準確的定界框(bounding box)定位與涵蓋目標物(target),進而提升相關視訊分析與處理之效能。360度視訊可提供使用者身入其境的觀賞經驗,但由多鏡頭影像拼貼(image stitching)而產生的現有360度視訊(360-degree video)中,可視接縫(visible seam)的鄰近區域之物件資料漏失,不僅影響使用者觀賞品質,更引致視覺追蹤的準確率下降。另一方面,現今雖已有為數眾多的以深度學習為基礎的高準確率單視角視訊的視覺追蹤器,混合影像序列中的強烈透明玻璃反射卻會大幅降低其追蹤準確率。溯源以上兩者追蹤議題的根本,皆因視訊特性改變了目標物外觀,而深度學習之生成對抗網路(Generative Adversarial Networks, GAN),兼具資料生成(generation)與鑑別(discrimination)。考量現今鮮少文獻對於以上兩項議題著墨探討,本一年期計畫之研究目標,將基於生成對抗網路,對視覺追蹤提出兩項重點設計: (1) 基於GAN對現有經影像拼接(image stitching)產生之360度視訊,進行影像修復(image restoration)。(2) 基於GAN的混合影像序列(mixed image sequences)之視覺追蹤(visual tracking)。本計畫研究成果預期將有效提升360度視訊與混合影像序列的視覺追蹤準確率,提升現有360度視訊的觀賞品質,進而提升相關視訊處理與分析應用之效能。 ;Visual tracking aims at locating and covering the target with an accurate bounding box. With the aid of visual tracking, performance of related video analysis and processing can be much improved. 360-degree videos provide users immersive viewing experiences. However, the missing data of targets in the neighborhood of visible seams, caused by image stitching on multi-view images, raises the problem of inaccurate tracking on existing 360-degree videos. On the other hand, although there have been significant high-accuracy deep learning based trackers proposed for single-view videos, strong reflections resulted from transparent glass seriously degrade tracking accuracy on mixed images with reflections. It is found that the source of the aforementioned tracking problems is that the target appearance varies with the characteristic of video contents. Since the generative adversarial networks (GAN) features with generation and discrimination simultaneously and there is few work related to the aforementioned issues, this project will propose two designs related to visual tracking: (1) GAN based image restoration for existing 360-degree videos. (2) GAN based visual tracking on mixed images with reflections. This project expects to increase accuracy of visual tracking on 360-degree videos and mixed images and improve viewing experiences of 360-degree videos. Accordingly, performance related video processing and analysis can be significantly improved.