Recurrent Learning on PM2.5 Prediction Based on Clustered Airbox Dataset

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.141.41.194

姓名

駱佳妤(Chia-Yu Lo) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(Recurrent Learning on PM2.5 Prediction Based on Clustered Airbox Dataset)

相關論文

★ Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks	★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位	★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?	★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法	★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法	★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories	★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks	★ 以影像為基礎的定位系統
★ 大範圍無線感測網路下分散式資料壓縮收集演算法	★ 車用WiFi網路中的碰撞分析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

工業發展的進步提升了對電力的需求。然而，核力發電的安全疑
慮令人擔心，許多國家仍然依賴火力發電廠，這將導致在燃煤過程中
產生更多的空氣污染物。這種現象的發生以及車輛排放量的增加，已
經成為空氣污染嚴重的主要因素。當人類吸入過多的空氣污染懸浮微
粒可能導致呼吸道的疾病甚至死亡，其中PM2.5 尤為嚴重。透過預測
空氣污染物的濃度，人們可以採取預防措施，以避免過度暴露於空氣
污染物中。因此，準確的預測PM2.5 濃度變得更加重要。在本文中，
我們提出了一個PM2.5 濃度的預測系統，該系統使用了EdiGreen
Airbox 和台灣環保署的數據。採用Autoencoder 和線性插值法來處理
缺失值的問題。除此之外，Spearman 的相關係數用於識別與PM2.5 最
相關的特徵。我們實做了兩個不同預測模型（即，LSTM 與基於Kmeans
的LSTM）來預測每個Airbox 設備的PM2.5 值。為了評估模
型的預測性能，計算特定一周內的每日平均誤差和每小時平均的準確
度。實驗結果顯示，基於K-means 的LSTM 在所有方法中具有最佳
的預測能力。因此，選擇基於K-means 的LSTM 的方法結合Linebot
提供即時的PM2.5 預測。

摘要(英)

The progress of industrial development naturally leads to the demand of more electrical power. Unfortunately, due to the fear of the safety of nuclear power plants, many countries have relied on thermal power plants, which will cause more air pollutants during the process of coal burning. This phenomenon as well as more vehicle emissions around us, have constituted the primary factors of serious air pollution. Inhaling too much particulate air pollution may lead to respiratory diseases and even death, especially PM2.5. By predicting the air pollutant concentration, people can take precautions to avoid overexposure in the air pollutants. Consequently, the accurate PM2.5 prediction becomes more important. In this thesis, we propose a PM2.5 prediction system, which utilizes the dataset from EdiGreen Airbox and Taiwan EPA. Autoencoder and Linear interpolation are adopted for solving the missing value problem. Spearman′s correlation coecient is used to identify the most relevant features for PM2.5. Two prediction models (i.e., LSTM and LSTM based on K-means) are implemented which predict PM2.5 value for each Airbox device. To assess the performance of the model prediction, the daily average error and the hourly average accuracy for the duration of a week are calculated. The experimental results show that LSTM based on K-means has the best performance among all methods. Therefore, LSTM based on K-means is chosen to provide real-time PM2.5 prediction through the Linebot.

關鍵字(中)

★ 空氣品質預測
★ 分群
★ 遞迴歸神經網路

關鍵字(英)

★ Air quality prediction
★ clustering
★ recurrent neural network

論文目次

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Air Quality Index Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 PM10 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 PM2:5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Statistics and Regression model . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . .13
3.1 Airbox Interworking Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Data Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Feature Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 Autoregressive Integrated Moving Average Model . . . . . . . . . . 19
3.4.2 Arti_cial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . 21
4 Design . . . . . . . . . . . . . . . . . . . . . . . . . .25
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Data Cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Data Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Prediction Model Construction . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Data Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.3 LSTM Neural Network Model . . . . . . . . . . . . . . . . . . . . . 39
4.4 Line Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . .45
5.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Experiment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 LSTM Neural Network Model . . . . . . . . . . . . . . . . . . . . . 50
5.3.2 LSTM Neural Network based on K-means . . . . . . . . . . . . . . 52
5.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4.1 Model Performance in Di_erent Season . . . . . . . . . . . . . . . . 61
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .64
Reference . . . . . . . . . . . . . . . . . . . . . . . . . .65

參考文獻

[1]E Agirre, A Anta, LJR Barron, and M Albizu. A neural network based model to forecast hourly ozone levels in rural areas in the basque country. Air Pollution XV., 1:109–118, 2007.
[2]Mouhammd Alkasassbeh, Alaa F Sheta, Hossam Faris, Hamza Turabieh, et al. Pre- diction of pm10 and tsp air pollution parameters using artificial neural network au- toregressive, external input models: a case study in salt, jordan. Middle-East Journal of Scientific Research, pages 999–1009, 2013.
[3]V Athira, P Geetha, R Vinayakumar, and KP Soman. Deepairnet: Applying recur- rent networks for air quality prediction. Procedia computer science, pages 1394–1403, 2018.
[4] Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependen- cies with gradient descent is difficult. IEEE transactions on neural networks, pages 157–166, 1994.
[5] Robert D Brook, Sanjay Rajagopalan, C Arden Pope III, Jeffrey R Brook, Aruni Bhatnagar, Ana V Diez-Roux, Fernando Holguin, Yuling Hong, Russell V Luepker, Murray A Mittleman, et al. Particulate matter air pollution and cardiovascular disease: an update to the scientific statement from the american heart association. Circulation, pages 2331–2378, 2010.
[6] Avijoy Chakma, Ben Vizena, Tingting Cao, Jerry Lin, and Jing Zhang. Image- based air quality analysis using deep convolutional neural network. In 2017 IEEE International Conference on Image Processing (ICIP), pages 3949–3952. IEEE, 2017.
[7] Archontoula Chaloulakou, Georgios Grivas, and Nikolas Spyrellis. Neural network and multiple regression models for pm10 prediction in athens: a comparative as- sessment. Journal of the Air & Waste Management Association, pages 1183–1190, 2003.
[8] Yuan Chen, Hui Qin, and ZhiGang Zhou. A comparative study on multi-regression analysis and bp neural network of pm2. 5 index. In 2014 10th International Confer- ence on Natural Computation (ICNC), pages 155–159. IEEE, 2014.
[9] Antonio J Conejo, Miguel A Plazas, Rosa Espinola, and Ana B Molina. Day-ahead electricity price forecasting using the wavelet transform and arima models. IEEE transactions on power systems, pages 1035–1042, 2005.
[10] Google Developers. Google maps platform. https://developers.google.com/ maps/documentation/?hl=zh-tw/.
[11] Line Developers. Messaging api. https://developers.line.biz/en/docs/ messaging-api/overview/.
[12] Luis A D´ıaz-Robles, Juan C Ortega, Joshua S Fu, Gregory D Reed, Judith C Chow, John G Watson, and Juan A Moncada-Herrera. A hybrid arima and artificial neural networks model to forecast particulate matter in urban areas: The case of temuco, chile. Atmospheric Environment, pages 8331–8340, 2008.
[13] Ming Dong, Dong Yang, Yan Kuang, David He, Serap Erdal, and Donna Kenski. Pm2. 5 concentration prediction using hidden semi-markov model-based times series data mining. Expert Systems with Applications, pages 9046–9055, 2009.
[14] Elia Georgiana Dragomir. Air quality index prediction using k-nearest neighbor technique. Bulletin of PG University of Ploiesti, Series Mathematics, Informatics, Physics, LXII, pages 103–108, 2010.
[15] Volkan S¸ Ediger and Sertac Akar. Arima forecasting of primary energy demand by fuel in turkey. Energy policy, pages 1701–1708, 2007.
[16] Edimax. The home page of edigreen airbox. https://airbox.edimaxcloud.com/.
[17] MA Elangasinghe, N Singhal, KN Dirks, JA Salmond, and S Samarasinghe. Complex time series analysis of pm10 and pm2. 5 for a coastal site using artificial neural network modelling and k-means clustering. Atmospheric Environment, pages 106– 116, 2014.
[18] Xiao Feng, Qi Li, Yajie Zhu, Junxiong Hou, Lingyan Jin, and Jingjie Wang. Artifi- cial neural networks forecasting of pm2. 5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, pages 118– 128, 2015.
[19] Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, pages 215–223, 1979.
[20] Lovedeep Gondara and Ke Wang. Mida: Multiple imputation using denoising au- toencoders. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 260–272. Springer, 2018.
[21] John A Hartigan and Manchek A Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), pages 100–108, 1979.
[22] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, pages 504–507, 2006.
[23] Chiou-Jye Huang and Ping-Huan Kuo. A deep cnn-lstm model for particulate matter (pm2. 5) forecasting in smart cities. Sensors, page 2220, 2018.
[24] Zhexue Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, pages 283–304, 1998.
[25] Heikki Junninen, Harri Niska, Kari Tuppurainen, Juhani Ruuskanen, and Mikko Kolehmainen. Methods for imputation of missing values in air quality data sets. Atmospheric Environment, pages 2895–2907, 2004.
[26] Anikender Kumar and Pramila Goyal. Forecasting of air quality in delhi using princi- pal component regression technique. Atmospheric Pollution Research, pages 436–444, 2011.
[27] Xiang Li, Ling Peng, Xiaojing Yao, Shaolong Cui, Yuan Hu, Chengzeng You, and Tianhe Chi. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environmental Pollution, pages 997–1004, 2017.
[28] Yijun Lin, Yao-Yi Chiang, Fan Pan, Dimitrios Stripelis, Jos´e Luis Ambite, Sandrah P Eckel, and Rima Habre. Mining public datasets for modeling intra-city pm2. 5 concen- trations at a fine spatial resolution. In Proceedings of the 25th ACM SIGSPATIAL international conference on advances in geographic information systems, page 25. ACM, 2017.
[29] Yijun Lin, Nikhit Mago, Yu Gao, Yaguang Li, Yao-Yi Chiang, Cyrus Shahabi, and Jos´e Luis Ambite. Exploiting spatiotemporal patterns for accurate air quality fore- casting using deep learning. In Proceedings of the 26th ACM SIGSPATIAL Interna- tional Conference on Advances in Geographic Information Systems, pages 359–368. ACM, 2018.
[30] Line. Line platform. https://line.me/zh-hant/.
[31] Sachit Mahajan, Hao-Min Liu, Tzu-Chieh Tsai, and Ling-Jyh Chen. Improving the accuracy and efficiency of pm2. 5 forecast service using cluster-based hybrid neural network model. IEEE Access, pages 19193–19204, 2018.
[32] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
[33] Jonathan Masci, Ueli Meier, Dan Cire¸san, and Ju¨rgen Schmidhuber. Stacked convo- lutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks, pages 52–59. Springer, 2011.
[34] Ujjwal Maulik and Sanghamitra Bandyopadhyay. Genetic algorithm-based clustering technique. Pattern recognition, pages 1455–1465, 2000.
[35] mySociety. Mapit:map costcodes and geographical points to administrative areas.
https://global.mapit.mysociety.org/#1527220/.
[36] XY Ni, Hong Huang, and WP Du. Relevance analysis and short-term prediction of pm2. 5 concentrations in beijing based on multi-source data. Atmospheric environ- ment, pages 146–161, 2017.
[37] World Health Organization. Ambient (outdoor) air quality and health. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)
-air-quality-and-health/.
[38] Stanislaw Osowski and Konrad Garanty. Forecasting of the daily meteorological pollution using wavelets and support vector machine. Engineering Applications of Artificial Intelligence, pages 745–755, 2007.
[39] Thrasyvoulos N Pappas. An adaptive clustering algorithm for image segmentation.
IEEE Transactions on signal processing, pages 901–914, 1992.
[40] Yu Pengfei, He Juanjuan, Liu Xiaoming, and Zhang Kai. Industrial air pollution prediction using deep neural network. In International Conference on Bio-Inspired Computing: Theories and Applications, pages 173–185. Springer, 2018.
[41] Patricio P´erez, Alex Trier, and Jorge Reyes. Prediction of pm2. 5 concentrations several hours in advance using neural networks in santiago, chile. Atmospheric En- vironment, pages 1189–1196, 2000.
[42] Vikram Reddy, Pavan Yedavalli, Shrestha Mohanty, and Udit Nakhat. Deep air: Forecasting air pollution in beijing, china, 2018.
[43] Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, pages 53–65, 1987.
[44] Gholamhosein Sheikholeslami, Surojit Chatterjee, and Aidong Zhang. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In VLDB, pages 428–439, 1998.
[45] Academia Sinica. Pm 2.5 open data portal. https://pm25.lass-net.org/.
[46] Wei Sun, Hao Zhang, Ahmet Palazoglu, Angadh Singh, Weidong Zhang, and Shiwei Liu. Prediction of 24-hour-average pm2. 5 concentrations using a hidden markov model with different emission distributions in northern california. Science of the total environment, pages 93–103, 2013.
[47] Environmental Protection Administration Executive Yuan R.O.C (Taiwan). The official website of environmental protection administration. https://taqm.epa.gov. tw/taqm/tw/b0201.aspx/.
[48] Environmental Protection Administration Executive Yuan R.O.C (Taiwan). The official website of environmental protection administration. https://taqm.epa.gov. tw/taqm/tw/b0905.aspx/.
[49] Jianhua Wang and Susumu Ogawa. Effects of meteorological conditions on pm2. 5 concentrations in nagasaki, japan. International journal of environmental research and public health, pages 9089–9101, 2015.
[50] Ping Wang, Hong Zhang, Zuodong Qin, and Guisheng Zhang. A novel hybrid-garch model based on arima and svm for pm2. 5 concentrations forecasting. Atmospheric Pollution Research, pages 850–860, 2017.
[51] Mars Xu and Yue-Xia Wang. Quantifying pm 2.5 concentrations from multi-weather sensors using hidden markov models. IEEE Sensors Journal, pages 22–23, 2016.
[52] Vibha Yadav and Satyendra Nath. Daily prediction of pm 10 using radial basis function and generalized regression neural network. In 2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS), pages 1–5. IEEE, 2018.
[53] Xiuwen Yi, Junbo Zhang, Zhaoyuan Wang, Tianrui Li, and Yu Zheng. Deep dis- tributed fusion network for air quality prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 965–973. ACM, 2018.
[54] Chao Zhang, Junchi Yan, Changsheng Li, Xiaoguang Rui, Liang Liu, and Rongfang Bie. On estimating air pollution from photos using convolutional neural network. In Proceedings of the 24th ACM international conference on Multimedia, pages 297–301. ACM, 2016.
[55] Hong Zhang, Yong Liu, Rui Shi, and Qingchen Yao. Evaluation of pm10 forecast- ing based on the artificial neural network model and intake fraction in an urban area: A case study in taiyuan city, china. Journal of the Air & Waste Management Association, pages 755–763, 2013.
[56] CX Zhao, YQ Wang, YJ Wang, HL Zhang, and Bing-Qing Zhao. Temporal and spatial distribution of pm2. 5 and pm10 pollution status and the correlation of par- ticulate matters and meteorological factors during winter and spring in beijing. Huan jing ke xue= Huanjing kexue, pages 418–427, 2014.
[57] Shanshan Zhou, Wenjing Li, and Junfei Qiao. Prediction of pm2. 5 concentration based on recurrent fuzzy neural network. In 2017 36th Chinese Control Conference (CCC), pages 3920–3924. IEEE, 2017.

指導教授

孫敏德(Min-Te Sun)

審核日期

2019-7-25

推文