我們建構一個基於深度時空圖模型的異常偵測框架,模型包含圖卷積和時間卷積兩個方法,時間卷積處理過往空氣品質感測器偵測到的pm2.5時序資料,找出時間上的相關性,同時於每個時間點,對感測器與其周圍感測器點建立成一個無向聯通圖,由圖卷積的方式,找出空間特徵,最後模型預測當下的pm2.5數值,並與空氣品質感測器偵測到的數值計算R2-score分數,並對所有空氣品質感測器站點的R2-score做低到高排序,R2-score越低代表感測器越有可能出現異常。經過實驗,在有限巡檢個數之下,我們模型找到有問題個數的比率跟過往異常偵測的方法和隨機挑選都來的高,優先檢查本模型認定的故障感測器可降低巡檢所需的人力與時間成本。;Air pollution is an essential issue in Taiwan. From 2017 to 2020, the Environmental Protection Administration of Executive Yuan in Taiwan has gradually deployed 10,200 air quality sensors, which have collected a large amount of air quality data with high time resolution. However, because the air quality sensors are small and fragile, environmental factors, such as sun, rain, and other physical characteristics, may influence or even damage these sensors. As a result, the monitored PM2.5 values from these malfunctioning sensors are inaccurate. The current strategy to identify the failure sensors requires on-site inspection. However, since the number of sensors is huge and sensors are located all over Taiwan, it is inefficient to discover the malfunctioning sensors by scheduled inspection or random sampling.
We propose an anomaly detection framework to identify the malfunctioning sensors based on a deep spatial-temporal graph model consisting of graph convolution and time convolution.
The time convolution discovers the temporal relationship among the monitored PM2.5 values of a sensor. At each time point, an undirected connected graph is established between each sensor and its surrounding sensors. The graph convolution utilizes these graphs to learn the spatial characteristics. We leverage this deep spatial-temporal graph model to predict the current PM2.5 value of a target sensor. We calculate the R2-score between the predicted PM2.5 values and the monitored PM2.5 values for each sensor and rank these sensors by their corresponding R2-scores in ascending order. We claim a sensor has malfunctioned if the R2-score is low (i.e., the predicted and the monitored PM2.5 values are very different). Experimental results show that our model identifies more problematic sensors with fewer trials. Consequently, examining the sensors with the order outputted by our model can save labor and time costs.