摘要: | 隨著個人電腦與各式錄影設備的普及, 配合寬頻網路的建置, 以及先進的視訊編碼技術, 大量的數位視訊得以廣泛地傳播與流通。同時,許多視訊共享網站被建立, 提供多樣的途徑讓使用者上傳與分享數位視訊, 而目前的網路頻寬有相當大的部分即為傳遞此類網站視訊資料所使用, 由此可看出其受歡迎的程度。然而, 對於擁有內容版權的電影/電視公司來說, 這樣的任意分享並不為他們所支持, 他們不願意讓內容被無償使用, 而大量未經授權的影片被放置於分享平台也可能影響其獲利。因此, 越來越多的著名視訊分享網站被要求移除某些違反版權的影片片段, 甚至遭到以違反著作權條款為由所控告。如何保護著作權並減少版權問題所引發的爭議成為這些視訊分享網站所要面對的重要議題。 本研究的目的在於提供一個藉由視訊內容比對來偵測視訊複製的機制。簡言之, 當視訊片段被上傳後, 該片段經處理後所產生的特徵資料會與儲存於視訊網站上的原始特徵資料比對, 以判斷上傳資料是否來自於原版影片的複製。為有效達成此目的, 我們將由視訊資料中擷取基於內容所產生的簽章或是雜湊函數以增進執行效率, 避免大量視訊資料的儲存。我們將先利用場景切換偵測技術將影片分成多個片段, 並由這些切換場景畫面中找出關鍵畫面, 再由這些關鍵畫面上取得空間域或像素域上的雜湊函數值。我們利用向量量化以及奇異值分解等方式產生所需比對的像素域特徵資料。利用正確比對所得到的畫面做為定位點, 然後我們再使用時間域特徵來確認視訊內容比對的準確性。本研究的主要挑戰在於如何於視訊雜湊函數的強健性、視訓分辨性與比對效率三者間取得平衡。我們相信本研究的產出不僅能夠提供一個視訊複製偵測的方式, 並且將有助於多媒體內容分析研究及其相關應用。 Digital videos are distributed widely these days on various kinds of media thanks to the proliferation of cheaper but increasingly powerful personal computers, the prevalence of high-speed networking facilities and the advanced video coding technologies. Many video web servers are available nowadays to provide convenient platforms for users to upload and share digital videos. However, video content providers do not always support these video web servers since many videos are uploaded/shared without their permission and infringe their intellectual property rights (IPR). The popular video servers may often be requested to remove certain video clips or even be sued for copyright violation. Therefore, the issues of copyright protection become very critical for the owners of popular video web servers to reduce such controversies or disputes. In this research, we aim at providing a feasible content-based video copy detection scheme. The content of the uploaded video with be matched with those of original videos stored in the video web servers to determine whether it is a duplicate copy that may infringe the copyright. To be more specific, the content matching will be based on the comparison of the significant features, which are extracted from the uploaded and original videos and act as the signature or video hash, instead of the videos themselves to avoid the requirement of extremely large storage. First, the shot boundary detection is applied on the video to determine the candidates of key frames. The key frames with large motions or unique visual characteristics will be selected as the anchor points for content matching. Then the spatial or pixel domain hash will be extracted from the anchor frame.We apply vector quantization and singular value decomposition. Finally, the temporal features, i.e. the shot lengths, will be matched to further ensure the correctness of content matching. The research objective is to maintain a good balance between robustness, discrimination and efficiency. We believe that the contribution of this research will also be helpful to such fields as consumer multimedia collection, multimedia linking and content analysis. |