摘要: | 近年由於物聯網(Internet of Things, IoT)技術的迅速發展,各式各樣遍佈在我們生活周遭的感測器不斷累積巨量的時間序列(time series)資料(簡稱時序資料),因此,對於時序資料的分析需求快速增加,而異常檢測(anomaly detection)是各種需求中的重要項目之一。本篇論文提出單變數時序資料之異常檢測框架,先依照時序資料的特徵,透過迪基-福勒檢驗、快速傅立葉轉換以及皮爾森積矩相關係數將時序資料分為三類: (1)平穩時序資料、(2)週期性時序資料與(3)非平穩且非週期時序資料;然後再針對不同類型的時序資料使用基於統計以及深度學習的不同方法進行異常檢測。 在平穩時序資料方面,我們利用一個較大及一個較小的滑動窗口之平均值計算其變化率,並設定變化率閥值來即時偵測異常;在週期性時序資料方面,我們計算當前週期與前一週期之時間視窗內資料的標準差比值,並設定閥值來偵測異常;最後在非平穩且非週期時序資料方面,我們則使用閘遞迴單元(gated recurrent unit, GRU)神經網路模型針對時序資料進行預測,並以預測誤差透過常態分佈的累積密度函數進行異常偵測。 我們以美國Numenta公司在其開發的Nupic平台上公開的四個真實資料集以及一個人工資料集作為實驗數據,並與ADSaS、STL、SARIMA、LSTM、LSTM with STL等相關方法進行比較,實驗比較結果顯示,本論文所提的異常檢測框架具有最佳的F1-score。 ;A wide variety of time series data have recently been accumulated from sensors around our daily lives, due to the rapid development of the Internet of Things (IoT) technology. As a result, demands for analyzing time series data are rapidly increasing, and anomaly detection is one of the important tasks in various demands. This paper proposes an anomaly detection framework for univariate time series data. First, the time series data are divided into three categories according to the data characteristics. The three categories of data are (1) stationary time series data, (2) periodic time series data, and (3) non-stationary and non-periodic time series data based on the Dickey-Fuller test, fast Fourier transform (FFT), and Pearson product-moment correlation coefficient. Different schemes using statistics and deep learning concepts are then applied to different categories of data for performing anomaly detection. For stationary time series data, the ratio of the means of a large sliding time window and a small window is calculated. An anomaly is assumed to occur, if the ratio exceeds a threshold value. For periodic time series data, the period of the data is first derived. Afterwards, the standard deviation ratio of data in two consecutive periods is calculated. It is assumed that an anomaly occurs if the ratio exceeds a threshold value. For non-stationary and non-periodic time series data, the neural network of the gated recurrent unit (GRU) model is applied for predicting time series data value. The anomaly is detected on the basis of the cumulative density function of the normal distribution over prediction error. Four open real-word datasets and an artificial dataset released on Nupic platform maintained by Numenta corporation are used for performance evaluation of the proposed framework. The evaluation results are compared with those of related methods, namely the ADSaS, STL, SARIMA, LSTM, and LSTM with STL methods. The comparisons show that the proposed framework has the best F1 score for anomaly detection. |