Combining uncertainty modeling and temporal-channel network with CLIP model for weakly supervised video anomaly detection

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：46

、訪客IP：3.145.180.18

姓名

江毓晴(Yu-Qing Jiang) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

(Combining uncertainty modeling and temporal-channel network with CLIP model for weakly supervised video anomaly detection)

相關論文

★ Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks	★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位	★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?	★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法	★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法	★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories	★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks	★ 以影像為基礎的定位系統
★ 在大規模無線感測網路上採用密度計算Range-Free的定位	★ 大範圍無線感測網路下分散式資料壓縮收集演算法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-17以後開放)

摘要(中)

為了確保公共安全和保護個人財產，監視攝影機被廣泛設置在各種公共場所、公司以及住宅，用於記錄違法和異常活動。然而，異常事件通常只佔整部監視影片的一小部分。因此，影片異常檢測至關重要，因為它的目的是區分異常事件和正常事件，並找到這些異常發生的確切時間。近年來，視覺語言模型（VLM）在各種影像相關任務中取得了巨大成功。許多研究已將 VLM 的應用擴展到各種影片任務中，包括弱監督式影片異常檢測。我們將視覺語言模型與多尺度時序Transformer、通道注意力機制和不確定性建模策略結合，以捕捉更多判別性特徵並更有效地分離異常事件與正常事件。實驗結果表明，對於 UCF-Crime 和 XD-Violence 資料集中的大多數類別，我們的方法在弱監督式影片異常檢測方面優於目前最先進的模型。

摘要(英)

To ensure public safety and protect private property, surveillance cameras are widely deployed in various public spaces, companies, and residences to record illegal and anomalous activities. However, abnormal events typically account for only a small fraction of the total surveillance footage. Therefore, video anomaly detection is crucial, as it aims to distinguish abnormal events from normal events and find the exact time of these anomalies. In recent years, Vision-Language Models (VLMs) have achieved significant success in various image-related tasks. Many studies have extended the application of VLMs to video-level tasks, including weakly supervised video anomaly detection. We integrate VLM with multi-scale temporal transformer, channel attention mechanism, and uncertainty modeling strategy to capture more discriminative features and more effectively distinguish abnormal events from normal events. Experimental results show that our method outperforms current state-of-the-art models in weakly supervised video anomaly detection for most of the categories in the UCF-Crime and XD-Violence datasets.

關鍵字(中)

★ 弱監督式學習
★ 影片異常檢測

關鍵字(英)

★ weakly supervised learning
★ video anomaly detection

論文目次

1 Introduction 1
2 Related Work 4
2.1 Video anomaly detection 4
2.1.1 Semi supervised video anomaly detection 4
2.1.2 Weakly supervised video anomaly detection 5
2.2 Vision-Language Pre-training 6
3 Preliminary 8
3.1 CLIP 8
3.2 Multi-Instance Learning 9
3.3 VadCLIP 10
3.3.1 Local and Global Temporal Adapter 11
3.3.2 Dual Branch 12
3.4 Squeeze-and-Excitation Networks 13
3.5 Uncertainty Modeling 14
3.5.1 Uncertainty Modeling Loss 15
3.5.2 Background Entropy Loss 15
4 Design 17
4.1 Motivation 17
4.2 Problem Statement 17
4.3 Research Challenges 18
4.4 Proposed System Architecture 19
4.4.1 Preprocessing and Feature Extraction 20
4.4.2 Multi-Scale Local and Global Temporal Adapter 21
4.4.3 Loss Function 24
5 Performance 26
5.1 Datasets 26
5.2 Evaluation Metrics 27
5.3 Experimental Environment 28
5.4 Experimental Configurations 29
5.5 Experimental Results and Analysis 29
5.6 Ablation Studies 34
6 Conclusion 37

參考文獻

[1] Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
[2] Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik Chung Wu. Mgfn: Magnitude-contrastive glance-and-focus network for weakly supervised video anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 387–395, 2023.
[3] Yang Cong, Junsong Yuan, and Ji Liu. Sparse reconstruction cost for abnormal event detection. In CVPR 2011, pages 3449–3456, 2011. doi: 10.1109/CVPR.2011.5995434.
[4] Thomas G Dietterich, Richard H Lathrop, and Tom´as Lozano-P´erez. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89 (1-2):31–71, 1997.
[5] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recog nition at scale. arXiv preprint arXiv:2010.11929, 2020.
[6] Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international con ference on computer vision, pages 6202–6211, 2019.
[7] Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1705–1714, 2019.
[8] Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 733–742, 2016.
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[10] Or Hirschorn and Shai Avidan. Normalizing flows for human pose anomaly detec tion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13545–13554, 2023.
[11] Hyekang Kevin Joo, Khoa Vo, Kashu Yamazaki, and Ngan Le. Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection. In 2023 IEEE International Conference on Image Processing (ICIP), pages 3230–3234. IEEE, 2023.
[12] Pilhyeon Lee, Jinglu Wang, Yan Lu, and Hyeran Byun. Weakly-supervised temporal action localization by uncertainty modeling. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 1854–1862, 2021.
[13] Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. Future frame prediction for anomaly detection–a new baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6536–6545, 2018.
[14] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021. URL https://api.semanticscholar.org/CorpusID: 232352874.
[15] Yiwei Lu, K Mahesh Kumar, Seyed shahabeddin Nabavi, and Yang Wang. Future frame prediction using convolutional vrnn for anomaly detection. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–8. IEEE, 2019.
[16] Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE international con ference on computer vision, pages 341–349, 2017.
[17] Hui Lv, Zhongqi Yue, Qianru Sun, Bin Luo, Zhen Cui, and Hanwang Zhang. Un biased multiple instance learning for weakly supervised video anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8022–8031, 2023.
[18] Trong-Nguyen Nguyen and Jean Meunier. Anomaly detection in video sequence with appearance-motion correspondence. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1273–1283, 2019.
[19] Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, and Haibin Ling. Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision, pages 1–18. Springer, 2022.
[20] OpenAI. Openai, 2024. URL https://www.openai.com. Accessed: 2024-06-13.
[21] Hyunjong Park, Jongyoun Noh, and Bumsub Ham. Learning memory-guided normal ity for anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14372–14381, 2020.
[22] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In Interna tional conference on machine learning, pages 8748–8763. PMLR, 2021.
[23] Tal Reiss and Yedid Hoshen. Attribute-based representations for accurate and inter pretable video anomaly detection. arXiv preprint arXiv:2212.00789, 2022.
[24] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 27, 2014.
[25] Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6479–6488, 2018.
[26] Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In Proceedings of the IEEE/CVF international confer ence on computer vision, pages 4975–4986, 2021.
[27] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
[28] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar nett, editors, Advances in Neural Information Processing Systems, volume 30. Cur ran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/ paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[29] Xuanzhao Wang, Zhengping Che, Bo Jiang, Ning Xiao, Ke Yang, Jian Tang, Jieping Ye, Jingyu Wang, and Qi Qi. Robust unsupervised video anomaly detection by multipath frame prediction. IEEE transactions on neural networks and learning systems, 33(6):2301–2312, 2021.
[30] Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhi wei Yang. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In Computer Vision–ECCV 2020: 16th European Confer ence, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16, pages 322–339. Springer, 2020.
[31] Peng Wu, Xiaotao Liu, and Jing Liu. Weakly supervised audio-visual violence de tection. IEEE Transactions on Multimedia, 2022.
[32] Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, and Yanning Zhang. Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. arXiv preprint arXiv:2308.11681, 2023.
[33] Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, and Cordelia Schmid. Unloc: A unified framework for video localiza tion tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13623–13633, October 2023.
[34] Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pat tern recognition, pages 1237–1246, 2019.
[35] Hang Zhou, Junqing Yu, and Wei Yang. Dual memory units with uncertainty reg ulation for weakly supervised video anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 3769–3777, 2023.

指導教授

孫敏德(Min-Te Sun)

審核日期

2024-7-23

推文