博碩士論文 107522141 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:21 、訪客IP:18.226.185.23
姓名 黃雪玲(Shiue-Ling Huang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱
(Anomaly Detection for PM2.5 Sensors via Transfer Learning)
相關論文
★  Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks★ 以影像為基礎的定位系統
★ 大範圍無線感測網路下分散式資料壓縮收集演算法★ 車用WiFi網路中的碰撞分析
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 根據世界衛生組織估計,每年約有700萬人死於空氣汙染引發的相關疾病。在各種空氣汙染物中,PM2.5被認為是對人類影響最大的。為了監測周圍環境的PM2.5濃度,不同國家組織已經開始部署大量低成本的空氣品質感測器。然而,由於這些感測器的價格便宜,並且可能安裝在不適當的地方,因此某些空氣品質感測器的讀數可能不穩定。當使用PM2.5讀數進行數據分析時,應識別並清除這些不穩定的讀數。本文提出了一種基於深度學習的空氣品質感測器異常檢測系統。這項研究使用了兩個資料集,南海岸空氣品質管理區的PurpleAir和中央研究院的Airbox。雖然Airbox資料集中的PM2.5資料非常多,但是缺乏異常空氣品質感測器的標籤。相反,PurpleAir中氣品質感測器的分佈密度較低,但資料有室內和室外標籤。為了利用這兩個資料集,採用ADF框架標記Airbox資料集,將其用於訓練模型。然後,PurpleAir資料集用於遷移學習以重新訓練模型。PurpleAir測試集用於評估四個模型,包括來自遷移學習的LSTM模型和混合模型(將LSTM和XGBoost組合)以及僅使用PurpleAir資料集進行訓練的XGBoost和LSTM。實驗結果表明,遷移學習的過程有顯著提高了模型的性能,而且帶有遷移學習的混合模型在所有指標上均表現出最佳性能。
摘要(英) According to the World Health Organization, approximately 7 million people die each year from diseases caused by air pollution. Among different types of air pollutants, PM2.5 is known to be the most fatal to humans. To monitor the PM2.5 readings in the surrounding environment, several organizations in different countries have initiated to deploy a large number of low-cost air quality sensors. However, because these sensors are cheaply built and may be installed at inappropriate places, the readings of some air quality sensors may be erratic. When PM2.5 readings are used for data analysis, these erratic readings should be identified and removed. In this thesis, we propose a deep learning-based anomaly detection system for air quality sensors. The study uses two datasets, PurpleAir from South Coast Air Quality Management District and Airbox from Academia Sinica. While PM2.5 data in Airbox dataset are abundant, they lack the ground truth for anomalous air quality sensors. On the contrary, the density of air quality sensors in PurpleAir is low, but their data come with indoor and outdoor labels. To take advantage of both datasets, the ADF framework is adopted to label the Airbox dataset, which is then used to train a model. Then, the PurpleAir dataset is used for transfer learning to retrain the model. The PurpleAir test set is used to evaluate four models, including LSTM model and hybrid model (combining LSTM and XGBoost) from transfer learning and the XGBoost and LSTM that are trained using only the PurpleAir dataset. The experimental results show that the process of transfer learning significantly improves the model performance, and the hybrid model with transfer learning exhibits the best performance in all metrics.
關鍵字(中) ★ 空氣品質
★ 深度學習
★ 異常偵測
關鍵字(英) ★ Air quality
★ Deep learning
★ Anomaly detection
論文目次 1 Introduction 1
2 RelatedWork 5
2.1 Statistics-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Machine Learning-based approaches . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Non-parametric models . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Parametric models . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Deep learning-based models . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Spatial correlation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Preliminary 9
3.1 Airbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Missing Data Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Data Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . . . . . . 12
3.6 Statistical Anomaly Detection Framework . . . . . . . . . . . . . . . . . . 14
3.7 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Design 18
4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.3 Spatial Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.4 Anomaly Detection Model Design . . . . . . . . . . . . . . . . . . . 24
4.2.5 Hybrid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.6 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Performance 29
5.1 Experiment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Model tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 The BigTaipei dataset . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2 The San Francisco dataset . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Performance Comparison and Analysis . . . . . . . . . . . . . . . . . . . . 40
6 Conclusions 42
Reference 43
參考文獻 [1] Mohd Mustafa Al Bakri Abdullah. Filling missing data using interpolation methods:
Study on the effect of fitting distribution. Key Engineering Materials, 594-595:889–
895, 01 2014.
[2] Tahani Aljuaid and S. Sasi. Proper imputation techniques for missing values in data
sets. pages 1–5, 08 2016.
[3] Mennatallah Amer, Markus Goldstein, and Slim Abdennadher. Enhancing one-class
support vector machines for unsupervised anomaly detection. pages 8–15, 08 2013.
[4] Judith Amores, Pattie Maes, and Joe Paradiso. Bin-ary: detecting the state of organic trash to prevent insalubrity. In Kenji Mase, Marc Langheinrich, Daniel GaticaPerez, Hans Gellersen, Tanzeem Choudhury, and Koji Yatani, editors, Proceedings of
the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers, UbiComp/ISWC Adjunct 2015, Osaka, Japan, September 7-11, 2015, pages
313–316. ACM, 2015.
[5] Thomas Bateson and Joel Schwartz. Children’s response to air pollutants. Journal
of toxicology and environmental health. Part A, 71:238–43, 02 2008.
[6] Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. A
study of the behavior of several methods for balancing machine learning training
data. SIGKDD Explor. Newsl., 6(1):20–29, June 2004.
[7] Central Weather Bureau. Central weather bureau. https://www.cwb.gov.tw/V8/
C/.
[8] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41, 07 2009.
[9] Fenxiao Chen, Yun-Cheng Wang, Bin Wang, and C.-C. Jay Kuo. Graph representation learning: a survey. APSIPA Transactions on Signal and Information Processing,
9, 2020.
[10] Ling-Jyh Chen, Yao Ho, Hsin-Hung Hsieh, Shih-Ting Huang, Hu-Cheng Lee, and
Sachit Mahajan. Adf: an anomaly detection framework for large-scale pm2.5 sensing
systems. IEEE Internet of Things Journal, 5(2):559–570, 2017.
[11] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. pages
785–794, 08 2016.
[12] Kaohsiung City Council. Air pollution in kaohsiung is serious in autumn and winter,
so schools are forbidden to hold sports games. https://www.kcc.gov.tw/News_
Content.aspx?n=47&s=3748.
[13] T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Transactions
on Information Theory, 13(1):21–27, 1967.
[14] M. da Silva Ferreira, L. F. Vismari, P. S. Cugnasca, J. R. de Almeida, J. B. Camargo,
and G. Kallemback. A comparative analysis of unsupervised learning techniques for
anomaly detection in railway systems. In 2019 18th IEEE International Conference
On Machine Learning And Applications (ICMLA), pages 444–449, 2019.
[15] South Coast Air Quality Management District. Purpleair: Real-time air quality
monitoring. https://www2.purpleair.com/.
[16] South Coast Air Quality Management District. Purpleair: Real-time
air quality monitoring faq. https://www2.purpleair.com/community/faq#
!hc-how-do-i-calibrate-my-purpleair-sensor-1.
[17] Anthony Goldbloom. The home page of kaggle inc. https://www.kaggle.com, 2010.
[18] Lovedeep Gondara and Ke Wang. Mida: Multiple imputation using denoising autoencoders, 2018.
[19] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press,
2016. http://www.deeplearningbook.org.
[20] government’s open data. Dada.gov. https://www.data.gov/, 1997.
[21] Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing
Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, and Tsuhan
Chen. Recent advances in convolutional neural networks, 2017.
[22] JA Hartigan and MA Wong. Algorithm AS 136: A K-means clustering algorithm.
Applied Statistics, pages 100–108, 1979.
[23] Douglas M. Hawkins. Identification of outliers / D.M. Hawkins. Chapman and Hall
London ; New York, 1980.
[24] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
[25] R. Hu, C. C. Aggarwal, S. Ma, and J. Huai. An embedding approach to anomaly detection. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE),
pages 385–396, 2016.
[26] Wenjie Hu, Yihua Liao, and Rao Vemuri. Robust anomaly detection using support
vector machines. Proceedings of the International Conference on Machine Learning,
06 2003.
[27] Guowen Huang, Ling-Jyh Chen, W.-H Hwang, S. Tzeng, and Hsin-Cheng Huang.
Real-time pm2.5 mapping and anomaly detection from airboxes in taiwan. Environmetrics, 29, 2018.
[28] R. Jain and H. Shah. An anomaly detection in smart cities modeled as wireless sensor
network. In 2016 International Conference on Signal and Information Processing
(IConSIP), pages 1–5, 2016.
[29] D. Janakiram, A. V. U. P. Kumar, and A. M. Reddy V. Outlier detection in wireless
sensor networks using bayesian belief networks. In 2006 1st International Conference
on Communication Systems Software Middleware, pages 1–6, 2006.
[30] Atsutoshi Kumagai, Tomoharu Iwata, and Yasuhiro Fujiwara. Semi-supervised
anomaly detection on attributed graphs, 02 2020.
[31] R. Kumar Dwivedi, S. Pandey, and R. Kumar. A study on machine learning approaches for outlier detection in wireless sensor network. In 2018 8th International
Conference on Cloud Computing, Data Science Engineering (Confluence), pages 189–
192, 2018.
[32] D. Kwon, K. Natarajan, S. C. Suh, H. Kim, and J. Kim. An empirical study on
network anomaly detection using convolutional neural networks. In 2018 IEEE 38th
International Conference on Distributed Computing Systems (ICDCS), pages 1595–
1598, 2018.
[33] Chieh-Han Lee, Yeuh-Bin Wang, and Hwa-Lung Yu. An efficient spatiotemporal
data calibration approach for the low-cost pm2.5 sensing network: A case study in
taiwan. Environment International, 130:104838, 2019.
[34] Yuan-Chien Lin, Wan-Ju Chi, and Yong-Qing Lin. The improvement of spatialtemporal resolution of pm2.5 estimation based on micro-air quality sensors by using
data fusion technique. Environment International, 134:105305, 2020.
[35] C. Y. Lo, W. H. Huang, M. F. Ho, M. T. Sun, L. J. Chen, K. Sakai, and W. S. Ku.
Recurrent learning on pm2.5 prediction based on clustered airbox dataset. IEEE
Transactions on Knowledge and Data Engineering, pages 1–1, 2020.
[36] Cyuan-Heng Luo, Fu-Hsiang Ching, Yun-Jie Wang, Tzu-Heng Huang, and Ling-Jyh
Chen. A study on calibrating air quality values between low-cost air quality sensors
and professional testing stations., 2019.
[37] Popescu Marius, Valentina Balas, Liliana Perescu-Popescu, and Nikos Mastorakis.
Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and
Systems, 8, 07 2009.
[38] J. Murphree. Machine learning anomaly detection in large systems. In 2016 IEEE
AUTOTESTCON, pages 1–9, 2016.
[39] mySociety. Mapit:map costcodes and geographical points to administrative areas.
https://global.mapit.mysociety.org/#1527220, 1997.
[40] Radu Stefan Niculescu, Tom M. Mitchell, and R. Bharat Rao. Bayesian network learning with parameter constraints. Journal of Machine Learning Research,
7(50):1357–1383, 2006.
[41] D. Nielsen. Tree boosting with xgboost - why does xgboost win ”every” machine
learning competition? 2016.
[42] The official website of environmental protection administration. Environmental protection administration executive yuan, r.o.c.(taiwan). https://airtw.epa.gov.tw/
CHT/default.aspx.
[43] Special Interest Group on Knowledge Discovery in Data. Kdd cup archives. https:
//www.kdd.org/kdd-cup, 1997.
[44] Keith Ord. Outliers in statistical data: V. barnett and t. lewis, 1994, 3rd edition,
(john wiley sons, chichester), 584 pp., £55.00, isbn 0-471-93094-6. International
Journal of Forecasting, 12(1):175 – 176, 1996. Probability Judgmental Forecasting.
[45] World Health Organization. Who global ambient air quality database (update 2018).
https://https://www.who.int/airpollution/data/en/.
[46] Ioannis Paschalidis and Yin Chen. Statistical anomaly detection with sensor networks. TOSN, 7, 08 2010.
[47] E. L. Paula, M. Ladeira, R. N. Carvalho, and T. Marzag˜ao. Deep learning anomaly
detection as support fraud investigation in brazilian exports and anti-money laundering. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 954–960, 2016.
[48] Michele Penza, Domenico Suriano, Valerio Pfister, Mario Prato, and Gennaro Cassano. Urban air quality monitoring with networked low-cost sensor-systems †. Proceedings, 1:573, 08 2017.
[49] P. Priyanga S, K. Krithivasan, P. S, and S. Sriram V S. Detection of cyberattacks in industrial control systems using enhanced principal component analysis and
hypergraph-based convolution neural network (epca-hg-cnn). IEEE Transactions on
Industry Applications, 56(4):4394–4404, 2020.
[50] J. R. Quinlan. Induction of decision trees. Mach. Learn., 1(1):81–106, March 1986.
[51] Claude Sammut and Geoffrey I. Webb, editors. Logistic Regression, pages 631–631.
Springer US, Boston, MA, 2010.
[52] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph
neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
[53] Marco Schreyer, Timur Sattarov, Damian Borth, Andreas Dengel, and Bernd Reimer.
Detection of anomalies in large scale accounting data using deep autoencoder networks. CoRR, abs/1709.05254, 2017.
[54] K. M. Silva, B. A. Souza, and N. S. D. Brito. Fault detection and classification in
transmission lines based on wavelet transform and ann. IEEE Transactions on Power
Delivery, 21(4):2058–2063, 2006.
[55] Academia Sinica. The home page of edigreen airbox. https://github.com/cclljj/
TW-Civil-IoT-2020.
[56] T. T. Teoh, G. Chiew, E. J. Franco, P. C. Ng, M. P. Benjamin, and Y. J. Goh.
Anomaly detection in cyber security attacks on networks using mlp deep learning.
In 2018 International Conference on Smart Computing and Electronic Enterprise
(ICSCEE), pages 1–5, 2018.
[57] John W. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.
[58] Yanwen Wang, Yanjun Du, Jiaonan Wang, and Tiantian Li. Calibration of a low-cost
pm2.5 monitor using a random forest model. Environment international, 133:105161,
10 2019.
[59] W. Wu, X. Cheng, M. Ding, K. Xing, F. Liu, and P. Deng. Localized outlying and
boundary data detection in sensor networks. IEEE Transactions on Knowledge and
Data Engineering, 19(8):1145–1157, 2007.
[60] Z. Xiao, C. Liu, and C. Chen. An anomaly detection scheme based on machine
learning for wsn. In 2009 First International Conference on Information Science and
Engineering, pages 3959–3962, 2009.
[61] M. Xie, J. Hu, S. Han, and H. Chen. Scalable hypergrid k-nn-based online anomaly
detection in wireless sensor networks. IEEE Transactions on Parallel and Distributed
Systems, 24(8):1661–1670, 2013.
[62] Yu-Fei Xing, Yue-Hua Xu, Min-Hua Shi, and Yi-Xin Lian. The impact of pm2.5 on
the human respiratory system. Journal of Thoracic Disease, 8(1), 2016.
[63] Jerry Ye, Jyh-Herng Chow, and Jiang Chen. Stochastic gradient boosted distributed
decision trees. pages 2061–2064, 01 2009.
[64] B. Yegnanarayana. Artificial Neural Networks. Prentice-Hall of India Pvt.Ltd, 2004.
指導教授 孫敏德(Min-Te Sun) 審核日期 2021-2-23
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明