基於區塊一致性評估之影像竄改與深偽視訊偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：44

、訪客IP：3.149.234.9

姓名

黃博鴻(Bo-Hong Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於區塊一致性評估之影像竄改與深偽視訊偵測
(Detecting Forged Images and DeepFake Videos via Block Consistency Evaluation)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-25以後開放)

摘要(中)

數位影像編輯工具可輕易改變影像甚至視訊內容，並同時保持極高的畫面品質。深度偽造(DeepFake)的出現造成更大的影響，也因為各種惡意目的的操作下，這些對於內容的篡改使數位影像與視訊的真實性帶來威脅與挑戰。近年來有不少偵測畫面內容竄改的方法被提出，大多採用機器學習或深度學習相關技術。然而竄改方式的多樣與不斷演進變化，讓搜集所有型態的竄改資料以進行監督式訓練變得困難或不切實際，即便蒐集齊全也可能面臨資料集過於龐大而需要更多訓練資源等問題。
本研究從另一個角度出發，提出基於區塊相似性的深度學習辨識方法，透過評估區塊內容的一致性來判斷影像或視訊中的偽造或受影響區域。這種方法旨在避免蒐集各類型竄改資料進行訓練，我們選擇使用原始或未修改畫面區塊來實現相關的辨識與偵測。我們訓練一個卷積神經網路來提取影像區塊特徵，使用孿生(Siamese)網路進行區塊對之間的相似度比對，以確定畫面中可能被竄改的區域。對於影像竄改偵測，我們另引入分割網路以對竄改區域進行進一步精細處理。對於深偽視訊偵測，我們首先定位人臉區域，接著透過比對前後幀中的人臉區域相似度來判斷該視訊的真實性。我們在公開資料集上對所提出的方法進行測試和驗證，以證實所提出方法的可行性。這些資料集包含各種不同類型的影像與視訊，涵蓋多種內容竄改操作。與其他方法比較顯示所提出方案在準確性和穩定性的優越。

摘要(英)

Digital image editing tools enable effortless manipulation of images and video content while maintaining high visual quality. However, the emergence of DeepFake has introduced significant challenges to the authenticity of digital media. Various methods for detecting such content manipulations have been proposed, primarily relying on machine learning or deep learning techniques. However, the constantly evolving nature of manipulation methods makes it impractical to collect all types of manipulated data for supervised training. Additionally, handling large datasets can be resource-intensive. In this study, we propose a deep learning-based method that utilizes block similarity to identify forged or manipulated regions within images or DeepFake videos by evaluating the consistency of block content. Our approach aims to avoid the need for collecting various types of manipulated data for training. Instead, we opt to use original or unmodified blocks for forgery detection. We train a convolutional neural network to extract features from image blocks and employ a Siamese network to compare block similarity. For image manipulation detection, we introduce a segmentation network to further refine the detection of manipulated regions. In the cases of DeepFake video detection, we first locate facial regions and then determine the video′s authenticity by comparing facial region similarity between consecutive frames. We conduct tests on publicly available datasets, encompassing images and videos with various content manipulation operations. The experimental results demonstrate superior accuracy and stability compared to other existing methods.

關鍵字(中)

★ 影像竄改
★ 深度偽造
★ 孿生網路
★ 深度學習

關鍵字(英)

★ Image manipulation
★ DeepFake
★ Siamese network
★ deep learning

論文目次

論文摘要 I
Abstract II
誌謝 III
目錄 IV
圖目錄 VII
表目錄 VIII
第一章、緒論 1
1-1 研究動機 1
1-2 研究貢獻 2
1-3 論文架構 2
第二章、相關研究 3
2-1 數位影像鑑識 3
2-2 影像深偽（DeepFake） 4
2-2-1 身份替換（Identity Swapping） 5
2-2-2 表情重現（Expression Reenactment） 6
2-2-3 臉部合成(Face Synthesis) 7
2-2-4 臉部屬性操作(Facial Attribute Manipulation) 8
2-2-5 混合應用(Hybrid Applications) 8
2-3 深度偽造偵測(DeepFakes Detection) 9
2-3-1 幀級別檢測 9
2-3-2 視訊級別檢測 10
第三章、研究方法 11
3-1 系統架構 11
3-2 特徵提取器 13
3-3 相似性評估網路 17
3-4 影像竄改偵測 20
3-4-1 評比變因應對及限界閥值設定 21
3-4-2 遮罩精修化 24
3-5 DeepFake影片之視訊級別偵測 26
第四章、研究結果 31
4-1 開發環境 31
4-2 訓練資料 31
4-3 竄改影像偵測成果 33
4-3-1 偵測結果展示 33
4-3-2 成效評估 34
4-4 偽造視訊偵測成果 38
4-4-1 偵測結果展示 38
4-4-2 參考幀測試 40
第五章、結論與未來展望 41
5-1 結論 41
5-2 未來展望 41
第六章、參考文獻 43

參考文獻

[1] M. Kirchner and T. Gloe, "Forensic camera model identification," Handbook of Digital Forensics of Multimedia Data and Devices, pp. 329-374, 2015.
[2] A. Swaminathan, M. Wu, and K. R. Liu, "Nonintrusive component forensics of visual sensors using output images," IEEE Transactions on Information Forensics and Security, vol. 2, no. 1, pp. 91-106, 2007.
[3] T. Filler, J. Fridrich, and M. Goljan, "Using sensor pattern noise for camera model identification," in 2008 15th IEEE international conference on image processing, 2008: IEEE, pp. 1296-1299.
[4] G. Xu and Y. Q. Shi, "Camera model identification using local binary patterns," in 2012 IEEE international conference on multimedia and expo, 2012: IEEE, pp. 392-397.
[5] T. H. Thai, R. Cogranne, and F. Retraint, "Camera model identification based on the heteroscedastic noise model," IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 250-263, 2013.
[6] L. T. Van, S. Emmanuel, and M. S. Kankanhalli, "Identifying source cell phone using chromatic aberration," in 2007 IEEE International Conference on Multimedia and Expo, 2007: IEEE, pp. 883-886.
[7] H. Farid, "Image forgery detection," IEEE Signal processing magazine, vol. 26, no. 2, pp. 16-25, 2009.
[8] B. Bayar and M. C. Stamm, "Augmented convolutional feature maps for robust cnn-based camera model identification," in 2017 IEEE International Conference on Image Processing (ICIP), 2017: IEEE, pp. 4098-4102.
[9] A. E. Dirik and N. Memon, "Image tamper detection based on demosaicing artifacts," in 2009 16th IEEE International Conference on Image Processing (ICIP), 2009: IEEE, pp. 1497-1500.
[10] L. Bondi, S. Lameri, D. Guera, P. Bestagini, E. J. Delp, and S. Tubaro, "Tampering Detection and Localization Through Clustering of Camera-Based CNN Features," in CVPR Workshops, 2017, vol. 2, p. 2.
[11] O. Mayer and M. C. Stamm, "Learned forensic source similarity for unknown camera models," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: IEEE, pp. 2012-2016.
[12] "Deepfakes." https://github.com/deepfakes/faceswap Accessed: 2021-11-13. (accessed.
[13] I. Korshunova, W. Shi, J. Dambre, and L. Theis, "Fast face-swap using convolutional neural networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3677-3685.
[14] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, "Few-shot adversarial learning of realistic neural talking head models," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9459-9468.
[15] Y. Zhu, Q. Li, J. Wang, C.-Z. Xu, and Z. Sun, "One shot face swapping on megapixels," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4834-4844.
[16] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110-8119.
[17] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, "Face2face: Real-time face capture and reenactment of rgb videos," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387-2395.
[18] H. Kim et al., "Deep video portraits," ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1-14, 2018.
[19] O. Fried et al., "Text-based editing of talking-head video," ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1-14, 2019.
[20] T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401-4410.
[21] T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of gans for improved quality, stability, and variation," arXiv preprint arXiv:1710.10196, 2017.
[22] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, "Stargan: Unified generative adversarial networks for multi-domain image-to-image translation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8789-8797.
[23] T. Nguyen, A. T. Tran, and M. Hoai, "Lipstick ain′t enough: beyond color matching for in-the-wild makeup transfer," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13305-13314.
[24] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, "High-resolution image synthesis and semantic manipulation with conditional gans," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8798-8807.
[25] Y. Nirkin, Y. Keller, and T. Hassner, "Fsgan: Subject agnostic face swapping and reenactment," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7184-7193.
[26] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, "Two-stream neural networks for tampered face detection," in 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2017: IEEE, pp. 1831-1839.
[27] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "Mesonet: a compact facial video forgery detection network," in 2018 IEEE international workshop on information forensics and security (WIFS), 2018: IEEE, pp. 1-7.
[28] H. H. Nguyen, J. Yamagishi, and I. Echizen, "Capsule-forensics: Using capsule networks to detect forged images and videos," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: IEEE, pp. 2307-2311.
[29] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "Faceforensics++: Learning to detect manipulated facial images," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1-11.
[30] J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, "Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6458-6467.
[31] H. Liu et al., "Spatial-phase shallow learning: rethinking face forgery detection in frequency domain," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 772-781.
[32] H. H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen, "Multi-task learning for detecting and segmenting manipulated facial images and videos," in 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2019: IEEE, pp. 1-8.
[33] D. Güera and E. J. Delp, "Deepfake video detection using recurrent neural networks," in 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), 2018: IEEE, pp. 1-6.
[34] Y. Li, M.-C. Chang, and S. Lyu, "In ictu oculi: Exposing ai created fake videos by detecting eye blinking," in 2018 IEEE international workshop on information forensics and security (WIFS), 2018: IEEE, pp. 1-7.
[35] X. Yang, Y. Li, and S. Lyu, "Exposing deep fakes using inconsistent head poses," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: IEEE, pp. 8261-8265.
[36] B. Bayar and M. C. Stamm, "A deep learning approach to universal image manipulation detection using a new convolutional layer," in Proceedings of the 4th ACM workshop on information hiding and multimedia security, 2016, pp. 5-10.
[37] M. Huh, A. Liu, A. Owens, and A. A. Efros, "Fighting fake news: Image splice detection via learned self-consistency," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 101-117.
[38] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, "Signature verification using a" siamese" time delay neural network," Advances in neural information processing systems, vol. 6, 1993.
[39] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
[40] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, "High performance visual tracking with siamese region proposal network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8971-8980.
[41] P. Neculoiu, M. Versteegh, and M. Rotaru, "Learning text similarity with siamese recurrent networks," in Proceedings of the 1st Workshop on Representation Learning for NLP, 2016, pp. 148-157.
[42] R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR′06), 2006, vol. 2: IEEE, pp. 1735-1742.
[43] M. Forte and F. Pitié, "$ F $, $ B $, Alpha Matting," arXiv preprint arXiv:2003.07711, 2020.
[44] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing, vol. 13, no. 4, pp. 600-612, 2004.
[45] D. Shullani, M. Fontani, M. Iuliani, O. A. Shaya, and A. Piva, "Vision: a video and image dataset for source identification," EURASIP Journal on Information Security, vol. 2017, no. 1, pp. 1-16, 2017.
[46] M. Stamm, P. Bestagini, L. Marcenaro, and P. Campisi, "Forensic camera model identification: Highlights from the IEEE signal processing cup 2018 student competition [sp competitions]," IEEE Signal Processing Magazine, vol. 35, no. 5, pp. 168-174, 2018.
[47] T. Gloe and R. Böhme, "The′Dresden Image Database′for benchmarking digital image forensics," in Proceedings of the 2010 ACM symposium on applied computing, 2010, pp. 1584-1590.
[48] T. J. De Carvalho, C. Riess, E. Angelopoulou, H. Pedrini, and A. de Rezende Rocha, "Exposing digital image forgeries by illumination color classification," IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, pp. 1182-1194, 2013.
[49] A.-R. Gu, J.-H. Nam, and S.-C. Lee, "FBI-Net: Frequency-Based Image Forgery Localization via Multitask Learning With Self-Attention," IEEE Access, vol. 10, pp. 62751-62762, 2022.
[50] P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, "Image forgery localization via fine-grained analysis of CFA artifacts," IEEE Transactions on Information Forensics and Security, vol. 7, no. 5, pp. 1566-1577, 2012.
[51] S. Ye, Q. Sun, and E.-C. Chang, "Detecting digital image forgeries by measuring inconsistencies of blocking artifact," in 2007 IEEE International Conference on Multimedia and Expo, 2007: Ieee, pp. 12-15.
[52] B. Mahdian and S. Saic, "Using noise inconsistencies for blind image forensics," Image and Vision Computing, vol. 27, no. 10, pp. 1497-1503, 2009.
[53] R. Salloum, Y. Ren, and C.-C. J. Kuo, "Image splicing localization using a multi-task fully convolutional network (MFCN)," Journal of Visual Communication and Image Representation, vol. 51, pp. 201-209, 2018.
[54] Y. Wu, W. AbdAlmageed, and P. Natarajan, "Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9543-9552.
[55] H. Ding, L. Chen, Q. Tao, Z. Fu, L. Dong, and X. Cui, "DCU-Net: a dual-channel U-shaped network for image splicing forgery detection," Neural computing and applications, vol. 35, no. 7, pp. 5015-5031, 2023.
[56] Y. Zhang, G. Zhu, L. Wu, S. Kwong, H. Zhang, and Y. Zhou, "Multi-task SE-network for image splicing localization," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4828-4840, 2021.
[57] "Faceswap." https://github.com/MarekKowalski/FaceSwap.
Accessed: 2021-11-13. (accessed.
[58] J. Thies, M. Zollhöfer, and M. Nießner, "Deferred neural rendering: Image synthesis using neural textures," Acm Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1-12, 2019.
[59] J. Fridrich and J. Kodovsky, "Rich models for steganalysis of digital images," IEEE Transactions on information Forensics and Security, vol. 7, no. 3, pp. 868-882, 2012.
[60] D. Cozzolino, G. Poggi, and L. Verdoliva, "Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection," in Proceedings of the 5th ACM workshop on information hiding and multimedia security, 2017, pp. 159-164.
[61] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, "Celeb-df: A large-scale challenging dataset for deepfake forensics," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3207-3216.
[62] Y. Li and S. Lyu, "Exposing deepfake videos by detecting face warping artifacts," arXiv preprint arXiv:1811.00656, 2018.
[63] F. Matern, C. Riess, and M. Stamminger, "Exploiting visual artifacts to expose deepfakes and face manipulations," in 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2019: IEEE, pp. 83-92.
[64] X. Li et al., "Sharp multiple instance learning for deepfake video detection," in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 1864-1872.
[65] I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. AbdAlmageed, "Two-branch recurrent network for isolating deepfakes in videos," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, 2020: Springer, pp. 667-684.

指導教授

蘇柏齊

審核日期

2023-7-28

推文