結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：71

、訪客IP：18.227.52.94

姓名

張欣茹(Xin-Ru Zhang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測
(Semi-Supervised Model with Spatio-Temporal Data and Applied in PM2.5 sensor anomaly detection)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機	★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架
★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究	★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路
★ 使用圖神經網路偵測 PTT 的低活躍異常帳號	★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線
★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型	★ 一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

台灣近年來 PM2.5 空氣汙染的議題逐漸受到重視，增設了許多價格
較為低廉的感測器，但是這些感測器容易受到環境因素影響造成較大的
誤差，加上數量龐大造成每台感測器的維護頻率低，單一區域感測器回
傳的數值不如國家級測站來得可靠，
本論文比較了監督式、無監督式、及半監督式的演算法在偵測異常
傳感器的效果。為了結合感測器的時空資訊，我們將監測值轉成圖片資
料、整合性資料、以及整合資料結合時序資料來準備訓練數據。我們根
據工業技術研究所提供的檢測記錄得到感器測的狀態值（正常或異常），
探討了標記資料的比例對半監督模型預測效能的影響。實驗結果顯示：
我們研究的方法優於目前的隨機巡檢機制。

摘要(英)

The PM2.5 issue has drawn much attention in Taiwan, and many
inexpensive sensors have been deployed in recent years. However, these
sensors are fragile and susceptible to environmental factors. In addition,
the large number of sensors results in low maintenance frequency, so the
monitored values returned by a single sensor are unreliable.
This thesis compares supervised, unsupervised, and semi-supervised
methods to identify the problematic sensors. We prepared the training
data by converting monitored values into images, integrated data, and sequential data to incorporate the spatio-temporal information of the sensors.
We obtained sensors’status (normal or abnormal) based on the inspection records provided by the Industrial Technology Research Institute. We
explored how the ratio of labeled data to unlabeled data influences the performance of the semi-supervised models. Experimental results show that
our studied methods outperform the current inspection strategy (random
inspection).

關鍵字(中)

★ PM2.5
★ 異常偵測
★ 半監督模型
★ 時空資料結合

關鍵字(英)

★ PM2.5
★ anomaly detection
★ semi-supervised model
★ spatio-temporal data integration

論文目次

目錄
頁次
摘要 i
Abstract ii
目錄 iii

一、緒論 1
1.1 研究動機 1
1.2 方法簡介 2
1.3 論文貢獻 2

二、相關研究 4
2.1 PM2.5 感測器異常偵測相關研究 4
2.2 半監督模型異常偵測的相關研究 5

三、資料處理 7
3.1 資料填補的方法 7
3.2 將資料時空結合的方法 8
3.2.1 使用圖片特徵整合時空結合的資料 9
3.2.2 統整型資料 10
3.2.3 統整型資料加上時序資料 11
3.3 資料數量不足的解決方法 12

四、半監督模型介紹 15
4.1 SSDO(Semi-Supervised Detection of Outliers) 15
4.1.1 約束聚類 (Constrained Clustering) 16
4.1.2 透過已有的標籤進行更新分數 18
4.2 Deep SAD(Deep Semi-supervised Anomaly Detection) 19
4.2.1 Unsupervised Deep SVDD 19
4.2.2 Deep SAD 20

五、實驗結果 22
5.1 資料介紹及實驗設置 22
5.1.1 資料介紹 22
5.1.2 實驗設置 23
5.1.3 比較的模型 23
5.1.4 評量結果的方法 24
5.1.5 超參數的設定 25
5.2 實驗結果與討論 27
5.2.1 不同的模型的比較及實驗結果的探討 27
5.2.2 整合時空的資料型態探討 31
5.2.3 調整給予模型標記為正常、異常及未標記的比例 31
5.2.4 是否給予預訓練的影響 37

六、總結 39
6.1 結論 39
6.2 未來展望 40

參考文獻 41

參考文獻

[1] V. Van Zoest, A. Stein, and G. Hoek, “Outlier detection in urban air quality sensor
networks,” Water, Air, & Soil Pollution, vol. 229, no. 4, pp. 1–13, 2018.
[2] F. Xiao, M. Yang, H. Fan, G. Fan, and M. A. Al-Qaness, “An improved deep learning model for predicting daily pm2. 5 concentration,” Scientific Reports, vol. 10,
no. 1, pp. 1–11, 2020.
[3] L.-J. Chen, Y.-H. Ho, H.-H. Hsieh, S.-T. Huang, H.-C. Lee, and S. Mahajan, “Adf:
An anomaly detection framework for large-scale pm2. 5 sensing systems,” IEEE
Internet of Things Journal, vol. 5, no. 2, pp. 559–570, 2017.
[4] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: Identifying densitybased local outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp. 93–104.
[5] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 eighth ieee
international conference on data mining, IEEE, 2008, pp. 413–422.
[6] L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E.
Müller, and M. Kloft, “Deep one-class classification,” in International conference
on machine learning, PMLR, 2018, pp. 4393–4402.
[7] V. Vercruyssen, W. Meert, G. Verbruggen, K. Maes, R. Baumer, and J. Davis,
“Semi-supervised anomaly detection with an application to water analytics,” in
2018 ieee international conference on data mining (icdm), IEEE, vol. 2018, 2018,
pp. 527–536.
[8] L. Ruff, R. A. Vandermeulen, N. Görnitz, A. Binder, E. Müller, K.-R. Müller, and
M. Kloft, “Deep semi-supervised anomaly detection,” arXiv preprint arXiv:1906.02694,
2019.
[9] W. Meert, K. Hendrickx, and T. V. Craenendonck, Wannesm/dtaidistance v2.0.0,
version v2.0.0, Aug. 2020. doi: 10.5281/zenodo.3981067. [Online]. Available:
https://doi.org/10.5281/zenodo.3981067.
[10] G. A. Seber and A. J. Lee, Linear regression analysis. John Wiley & Sons, 2012,
vol. 329.
[11] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
[12] A. Liaw, M. Wiener, et al., “Classification and regression by randomforest,” R news,
vol. 2, no. 3, pp. 18–22, 2002.
[13] F. Rosenblatt, “Principles of neurodynamics. perceptrons and the theory of brain
mechanisms,” Cornell Aeronautical Lab Inc Buffalo NY, Tech. Rep., 1961.
[14] J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver
operating characteristic (roc) curve.,” Radiology, vol. 143, no. 1, pp. 29–36, 1982.
[15] J. Davis and M. Goadrich, “The relationship between precision-recall and roc
curves,” in Proceedings of the 23rd international conference on Machine learning,
2006, pp. 233–240.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2021-8-10

推文