dc.description.abstract | Machine learning has become a powerful tool in air quality assessment which can provide timely and predictable information, alert the public, and take timely measures to prevent deteriorating air quality. The study used an LSTM algorithm to predict short-term air quality. We focused on the PM2.5 concentration in Douliu, one of the most polluted sites in Taiwan. The challenge of Douliu’s air quality issue may be due to the complicated emission source and the effect of local circulation and topography. The EPA air quality data from the Douliu station was used as the primary input for model training. A sensibility test of different model setups was performed to rule out the best combination. The auxiliary features like AOD from the AERONET database, Pressure from CWB open source, time indicators, and PM2.5¬ ¬concentration from three nearby stations were also considered to improve the prediction performance.
The 24 cases represent 1- to 24-hour prediction in Douliu, 2021, was conducted to assess the model sensitivity. The optimal setup was selected with the best performance, whose RMSE varied from 6.4 to 13.1 µg/m3 over the 24 cases. The highest correlation was 0.92 for the 1-hour prediction, and the lowest was 0.58 for the next 24-hour forecast. The distribution value of predicted PM2.5 at the 1-hour forecast shows a consistent PM2.5 concentration with the variation of the observed PM2.5. Additionally, the model can predict the high PM2.5 event, nearly 100 µg/m3. This result indicated that the LSTM algorithm could overcome the underestimated issue, which is the practical problem with other algorithms. However, if we considered predicting at the longer prediction time, the model still met the underestimated issue. This could be seen in the 24-hour prediction model, which only predicted the high PM.2.5 event at 50 – 60 µg/m3. Additionally, using the Deep learning-based LSTM (using two layers of LSTM), the PM2.5 concentration from the model prediction shows an improvement. After considering the auxiliary features, the combination with PM2.5 features from the three nearby stations shows better performances, with the improvement in RMSE can reach to nearly 10%.
Seasonal and regional testing was conducted to assess the performance of the proposed model. The seasonal variation showed that the highest error, about 16.2 µg/m3, was observed during the winter, which is the high-polluted season in the area. On the other hand, the lowest error, 5.5 µg/m3, was observed during the summer; however, this also resulted in the lowest correlation. Because the summer is not the polluted season, leading to the low PM2.5 concentration. Regarding regional testing, ten stations in the western coastal region of Taiwan were selected to assess the model’s performance. Additionally, the prediction for the next 1-, 12-, and 24-hours models were chosen for comparison. The central and southern Taiwan stations present a similar trend to the Douliu station. On the other hand, the northern Taiwan stations perform the worst with a higher RMSE and lower correlation. Overall, this study provides a good reference for the best settings for deep learning–based AI model which meets Taiwan’s climate conditions and data resources. The model can be implemented for routine air quality monitoring in urban areas and air-quality alarms associated with public health.
| en_US |