論文名稱 利用異常檢測在串流資料建立預警系統
(Establishing an early warning system on streaming data by anomaly detection)
摘要(中) 科技的進步使我們能夠藉由機台內的感測元件快速地收集串流時間序列資料,而 資料挖礦是可以幫助我們從成千上萬的數據中,找出對我們有用的資訊,並協助管理者做適當的決策。在資料挖掘中,異常檢測是熱門的技術之一,在過去的研究當中,點異常與子序列異常是大家研究的焦點,整串的時間序列異常檢測較少被提及。


我們的研究資料取自半導體產業工廠的機台資料,本研究中,我們將資料的標籤定義為二元與機率,在標籤為二元時,我們藉由分類模型的幫助下找出異常點(anomaly point),藉由異常點做滑動視窗(Sliding window)的累加,並提出一個新的警報規則,並用來設計監控方案。而在機率標籤中,我們將資料的標籤定義為不發生異常的機率,藉由迴歸模型的幫助下,觀察過去正常與異常的差異,提出相對應的監控方案,並用來做出發出警報的依據。

最後,我們藉由捕獲率(Catching rate)以及誤報率(False alarm),並使用交叉驗證的方式,來評估系統的表現與穩定性。
摘要(英) Advances in technology have allowed us to quickly collect streaming time-series data using sensors in the machine. Data mining can help us find useful information from thousands of data and assist managers to make appropriate decisions.

In data mining, anomaly detection is one of the popular technologies. In the past research, point anomalies and subsequence anomalies have been widely discussed, and the whole time series of time series anomaly detection has been rarely mentioned.

In the semiconductor industry, in the process of the cutting silicon ingot, the occurrence of anomalies will damage the product and cause a lot of money and delay in delivery.

Therefore, it is necessary to find the anomalies as soon as possible and let the operators perform maintenance and stop.

Our research data is taken from the machine data of the semiconductor industry factory. In this study, we define the label presented in binary and probability. When the label presented in binary, we find anomaly point with the help of a classification model. Accumulation of sliding windows is performed through anomaly points. We proposed a new alarm rule, which is used to design a monitoring scheme. In the label presented in probability, we define the label as the non-anomaly probability. With the help of the regression model, we observe the difference between the normal and abnormal in the past, propose a corresponding monitoring scheme, and use it as issuing an alarm.

Finally, we using cross validation to evaluate the performance and robustness of the system by catching rate and false alarm.
關鍵字(中) ★ 預警系統
★ 異常檢測
★ 預測模型
關鍵字(英) ★ early warning system
★ anomaly detection
★ predictive model
論文目次 摘要 i
Abstract ii
Content iii
List of Tables iv
List of Figures v
Chapter 1 Introduction 1
1-1 Research background/motivation 1
1-2 Research objectives 2
Chapter 2 Literature review 4
2-1 Anomaly detection 4
2-1-1 The technique of anomaly detection 4
2-1-2 Anomaly type 5
2-1-3 Anomaly time series 6
2-2 Predictive model 6
2-2-1 Decision tree 7
2-2-2 Logistic regression 7
2-2-3 Bayesian network 8
2-2-4 Neural network 8
2-2-5 Gradient boosting 8
2-3 Early warning system 9
Chapter 3 Methodology 10
3-1 Data preparation 10
3-2 Label presented in binary and probability 11
3-3 Early warning system of label presented in binary 12
3-4 Early warning system of label presented in probability 14
3-5 Evaluation 16
Chapter 4 Data Analysis 18
4-1 Data preparation 18
4-2 Label presented in binary and probability 18
4-2-1 Label presented in binary 18
4-2-2 Label presented in probability 19
4-3 Result 20
4-3-1 The result of the label presented in binary 20
4-3-2 The result of the label presented in probability 24
Chapter 5 Conclusion 27
Reference 29
指導教授 曾富祥(Fu-Shiang Tseng) 審核日期 2020-8-20
