為了確保公共安全和保護個人財產,監視攝影機被廣泛設置在各種公共場所、公司以及住宅,用於記錄違法和異常活動。然而,異常事件通常只佔整部監視影片的一小部分。因此,影片異常檢測至關重要,因為它的目的是區分異常事件和正常事件,並找到這些異常發生的確切時間。近年來,視覺語言模型(VLM)在各種影像相關任務中取得了巨大成功。許多研究已將 VLM 的應用擴展到各種影片任務中,包括弱監督式影片異常檢測。我們將視覺語言模型與多尺度時序Transformer、通道注意力機制和不確定性建模策略結合,以捕捉更多判別性特徵並更有效地分離異常事件與正常事件。實驗結果表明,對於 UCF-Crime 和 XD-Violence 資料集中的大多數類別,我們的方法在弱監督式影片異常檢測方面優於目前最先進的模型。;To ensure public safety and protect private property, surveillance cameras are widely deployed in various public spaces, companies, and residences to record illegal and anomalous activities. However, abnormal events typically account for only a small fraction of the total surveillance footage. Therefore, video anomaly detection is crucial, as it aims to distinguish abnormal events from normal events and find the exact time of these anomalies. In recent years, Vision-Language Models (VLMs) have achieved significant success in various image-related tasks. Many studies have extended the application of VLMs to video-level tasks, including weakly supervised video anomaly detection. We integrate VLM with multi-scale temporal transformer, channel attention mechanism, and uncertainty modeling strategy to capture more discriminative features and more effectively distinguish abnormal events from normal events. Experimental results show that our method outperforms current state-of-the-art models in weakly supervised video anomaly detection for most of the categories in the UCF-Crime and XD-Violence datasets.