利用機器學習法估算台灣無測站區域之PM2.5濃度

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：93

、訪客IP：18.117.229.180

姓名

葉永昇(Yong-Sheng Ye) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

利用機器學習法估算台灣無測站區域之PM2.5濃度
(Estimating PM2.5 Concentrations for Ungauged Sites in Taiwan Using Machine Learning Approaches)

相關論文

★ 應用嵌入式系統於呼吸肌肉群訓練儀之系統開發	★ 勃起障礙與缺血性心臟病的雙向研究: 以台灣全人口基礎的世代研究
★ 基質輔助雷射脫附飛行時間式串聯質譜儀微生物抗藥性資料視覺化工具	★ 使用穿戴式裝置分析心律變異及偵測心律不整之應用程式
★ 建立一個自動化分析系統用來分析任何兩種疾病之間的關聯性透過世代研究設計以及使用承保抽樣歸人檔	★ 青光眼病患併發糖尿病,使用Metformin及Sulfonylurea治療得到中風之風險:以台灣人口為基礎的觀察性研究
★ 利用組成識別和序列及空間特性構成之預測系統來針對蛋白質交互作用上的特殊區段點位進行分析及預測辨識	★ 新聞語意特徵擷取流程設計與股價變化關聯性分析
★ 藥物與疾病關聯性自動化分析平台設計與實作	★ 建立財務報告自動分析系統進行股價預測
★ 建立一個分析疾病與癌症關聯性的自動化系統	★ 基於慣性感測器虛擬鍵盤之設計與實作
★ 一個醫療照護監測系統之實作	★ 應用手機開發手握球握力及相關資料之量測
★ 利用關聯分析全面性的搜索癌症關聯疾病	★ 全面性尋找類風濕性關節炎之關聯疾病

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來，空氣污染為許多國家嚴重問題，其中人們最關心的是PM2.5對健康的影響。目前，台灣約有80個監測站提供PM2.5的濃度監測，且主要集中在西部地區，然而，中心點監測儀的測量通常缺乏足夠的空間和時間分辨率以描述研究人群的暴露變化。因此，本論文主要致力於加以考量氣象觀測資料、除PM2.5之外的空氣污染物濃度、交通與土地利用等可能影響PM2.5濃度之因素，與相關領域專家之討論後，進行合理且適切的資料篩選與處理，接著，利用線性回歸、決策樹與隨機森林等機器學習方法建立估計台灣無測站區域的即時PM2.5濃度模型，其中以隨機森林方法建立之估計模型最為精確，其十次交叉驗證得到的評估指標決定係數（R2）的值為0.651、均方根誤差（RMSE）與平均絕對誤差（MAE）分別為11.05μg/m3與7.87μg/m3，相較先前採用神經網路或迴歸模型之研究，誤差值相去不遠。最後，我們將此模型應用在即時的PM2.5的濃度推估並呈現在網頁上，提供地區PM2.5濃度的查詢與每小時全台灣PM2.5濃度的分佈情形。

摘要(英)

Recently, air pollution has become a serious problem in developing countries. Among them, people are most concerned about the impact of PM2.5 on health. In Taiwan, about 80 stations provide PM2.5 concentrations monitoring, which is mainly concentrated in the west. The location of the monitoring station is mainly set up at various agencies or school sites and roofs. In addition, the measurements from a central point monitor often lack sufficient spatial and temporal resolution to capture the exposure variability of the study. In this study, we collected the meteorology, air pollution, traffic-related and land use data. After data screening and processing, we estimated hourly PM2.5 concentrations in ungauged sites in Taiwan by using linear regression, decision tree and random forest. Random forest has the best performance in estimating PM2.5 concentrations. We achieve the value of 10 cross-validation coefficient of determination (R2), root mean squared error (RMSE) and mean absolute error (MAE) are 0.651, 11.05μg/m3 and 7.84μg/m3 respectively. Compared with previous studies using neural networks or regulatory models, the error between our estimation and the actual measured value is not large. We also applied the model to the real time PM2.5 concentrations estimation and showed it on the website, which can provide the query of PM2.5 concentrations in the region and the distribution of PM2.5 concentrations in Taiwan every hour.

關鍵字(中)

★ PM2.5
★ 機器學習
★ 無測站

關鍵字(英)

★ PM2.5
★ Machine learning
★ Ungauged sites

論文目次

摘要 i
Abstract ii
致謝 iii
Table of Contents iv
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
1.1 Background 1
1.2 Related Works 3
1.3 Motivation 5
1.4 Research Goal 6
Chapter 2 Materials and Methods 7
2.1 Data Sources 7
2.2 Data Preprocessing 9
2.3 Machine Learning Methods 14
2.3.1 Linear Regression 14
2.3.2 Decision Tree 15
2.3.3 Random Forest 16
2.4 Real Time Estimation Processing 18
2.4.1 Web Crawler 19
2.4.2 Inverse Distance Weighting 21
2.5 Evaluation Metrics 22
Chapter 3 Results 23
3.1 Performance of PM2.5 Estimation Model 23
3.2 Performance of Real Time PM2.5 Estimation 25
3.3 Website and Application 26
3.4 Validation with Real Data 29
Chapter 4 Discussions and Conclusions 30
References 32

參考文獻

1. Yuan, P. and L. An, The Combined Analysis among PM2. 5, PM10 as Well as Other Air Pollutants, and the Meteorological Factor. 2016.
2. Art?nano, B., et al., Influence of traffic on the PM10 and PM2. 5 urban aerosol fractions in Madrid (Spain). Science of the Total Environment, 2004. 334: p. 111-123.
3. Martuzevicius, D., et al., Traffic-related PM2. 5 aerosol in residential houses located near major highways: indoor versus outdoor concentrations. Atmospheric Environment, 2008. 42(27): p. 6575-6585.
4. Mehta, A.J., et al., Long-term exposure to ambient fine particulate matter and renal function in older men: the veterans administration normative aging study. Environmental health perspectives, 2016. 124(9): p. 1353.
5. Wang, Y., et al., Long-term exposure to PM2. 5 and mortality among older adults in the southeastern US. Epidemiology, 2017. 28(2): p. 207-214.
6. Hung, L.-J., et al., Traffic air pollution and risk of death from ovarian cancer in Taiwan: fine particulate matter (PM2. 5) as a proxy marker. Journal of Toxicology and Environmental Health, Part A, 2012. 75(3): p. 174-182.
7. Hwang, S.-L., et al., Association between atmospheric fine particulate matter and hospital admissions for chronic obstructive pulmonary disease in Southwestern Taiwan: a population-based study. International journal of environmental research and public health, 2016. 13(4): p. 366.
8. Joo, Y.-H., S.-S. Lee, and K.-H. Park, Association between chronic laryngitis and particulate matter based on the Korea National Health and Nutrition Examination Survey 2008–2012. PloS one, 2015. 10(7): p. e0133180.
9. Poulsen, A.H., et al., Air pollution from traffic and risk for brain tumors: a nationwide study in Denmark. Cancer Causes & Control, 2016. 27(4): p. 473-480.
10. Yu, H.-L. and L.-C. Chien, Short-term population-based non-linear concentration–response associations between fine particulate matter and respiratory diseases in Taipei (Taiwan): a spatiotemporal analysis. Journal of Exposure Science and Environmental Epidemiology, 2016. 26(2): p. 197.
11. Asadollahfardi, G., H. Zangooei, and S.H. Aria, Predicting PM 2.5 concentrations using artificial neural networks and Markov chain, a case study Karaj City. Asian Journal of Atmospheric Environment, 2016. 10(2): p. 67-79.
12. Athanasiadis, I.N., et al. Applying machine learning techniques on air quality data for real-time decision support. in First international NAISO symposium on information technologies in environmental engineering (ITEE′2003), Gdansk, Poland. 2003.
13. Feng, X., et al., Artificial neural networks forecasting of PM2. 5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, 2015. 107: p. 118-128.
14. Hu, X., et al., Estimating PM2. 5 Concentrations in the Conterminous United States Using the Random Forest Approach. Environmental Science & Technology, 2017. 51(12): p. 6936-6944.
15. Nieto, P.G., et al., A SVM-based regression model to study the air quality at local scale in Oviedo urban area (Northern Spain): A case study. Applied Mathematics and Computation, 2013. 219(17): p. 8923-8937.
16. Pan, B. Application of XGBoost algorithm in hourly PM2. 5 concentration prediction. in IOP Conference Series: Earth and Environmental Science. 2018. IOP Publishing.
17. Perez, P., A. Trier, and J. Reyes, Prediction of PM2. 5 concentrations several hours in advance using neural networks in Santiago, Chile. Atmospheric Environment, 2000. 34(8): p. 1189-1196.
18. Singh, K.P., S. Gupta, and P. Rai, Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 2013. 80: p. 426-437.
19. Xu, Y., W. Yang, and J. Wang, Air quality early-warning system for cities in China. Atmospheric Environment, 2017. 148: p. 239-257.
20. Zhan, Y., et al., Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environmental Pollution, 2018. 233: p. 464-473.
21. Zheng, Y., F. Liu, and H.-P. Hsieh. U-air: When urban air quality inference meets big data. in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. ACM.
22. Varatharajan, R., et al., Visual analysis of geospatial habitat suitability model based on inverse distance weighting with paired comparison analysis. Multimedia Tools and Applications, 2017: p. 1-21.
23. Bartier, P.M. and C.P. Keller, Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW). Computers & Geosciences, 1996. 22(7): p. 795-799.
24. Lu, G.Y. and D.W. Wong, An adaptive inverse-distance weighting spatial interpolation technique. Computers & geosciences, 2008. 34(9): p. 1044-1055.
25. Baxter, L.K., et al., Exposure prediction approaches used in air pollution epidemiology studies: key findings and future recommendations. Journal of Exposure Science and Environmental Epidemiology, 2013. 23(6): p. 654.
26. Khademi, F., et al., Multiple linear regression, artificial neural network, and fuzzy logic prediction of 28 days compressive strength of concrete. Frontiers of Structural and Civil Engineering, 2017. 11(1): p. 90-99.
27. Pedregosa, F., et al., Scikit-learn: Machine learning in Python. Journal of machine learning research, 2011. 12(Oct): p. 2825-2830.
28. Coops, N.C., et al., Modeling the occurrence of 15 coniferous tree species throughout the Pacific Northwest of North America using a hybrid approach of a generic process?based growth model and decision tree analysis. Applied Vegetation Science, 2011. 14(3): p. 402-414.
29. Thelwall, M., A web crawler design for data mining. Journal of Information Science, 2001. 27(5): p. 319-325.
30. Vargiu, E. and M. Urru, Exploiting web scraping in a collaborative filtering-based approach to web advertising. Artificial Intelligence Research, 2012. 2(1): p. 44.
31. Li, T., et al., Estimating Ground?Level PM2. 5 by Fusing Satellite and Station Observations: A Geo?Intelligent Deep Learning Approach. Geophysical Research Letters, 2017. 44(23).

指導教授

洪炯宗(Jorng-Tzong Horng)

審核日期

2018-7-23

推文