Publications Repository - Helmholtz-Zentrum Dresden-Rossendorf

1 Publication

Machine learning-based ozone and PM2.5 forecasting: Application to multiple AQS sites in the Pacific Northwest

Fan, K.; Dhammapala, R.; Harrington, K.; Lamb, B.; Lee, Y. H.

Two versions of a machine learning (ML1 & ML2) based modeling framework have been successfully used to provide operational forecasts of O3 at Kennewick, WA. This paper shows the ML system performance when applied to all available observation locations in the Pacific Northwest to predict O3 and PM2.5 concentrations. We used historical O3 and PM2.5 concentrations, Weather Research and Forecasting (WRF) meteorological forecast data (including temperature, surface pressure, relative humidity, wind speed, wind direction, and planetary boundary layer height) and time information (including hour, weekday, and month) to train the model. A 10-time, 10-fold cross-validation method was used to evaluate the model performance. Similar to our previous study, ML1 correctly captures more high-O3 events, but also generates more false alarms, and ML2 performs better overall (R2 = 0.79), especially for low-O3 events. Our ML modeling framework utilizes both ML1 and ML2 results to achieve the best forecast performance. Compared to the WRF-CMAQ based forecast (i.e., AIRPACT), our final ML forecasts reduce the normalized mean bias (NMB) from 7.6% to 2.6% when evaluating against the observed mixing ratios. Our ML-based forecasts also show clear improvements on Air Quality Index (AQI) forecasts; more accurate O3 AQI index predictions for each AQI index including high-O3 AQI events. For PM2.5, ML1 and ML2 demonstrate similar capabilities to predict high-PM2.5 events and ML2 keeps its accuracy for low-PM2.5 predictions, so ML2 is used to provide the final forecast values, instead of combining the two ML models that we are using for O3. During wildfire seasons (May to September) and cold, winter seasons (November to February) from 2017 to 2020, our ML model clearly performs better than AIRPACT. AIRPACT under-predicts the wildfire season PM2.5 concentrations in the PNW (NMB = -27%) and over-predicts at some sites in the cold season up to 200%, while ML2 has a lower NMB in both seasons (NMB = 7.9% in the wildfire season and 2.2% in the cold season) and correctly captures more high-PM2.5 events.

Keywords: machine learning; air quality forecasts; ozone; PM2.5; random forest; multiple linear regression

Related publications

Permalink: https://www.hzdr.de/publications/Publ-35780