## Modelling air pollution and personal exposure in Bangkok and Mexico City using a land use regression model

##### Summary

The objective of this research is to examine the possibilities of modelling air pollution concentrations in cities outside Europe at a high spatial resolution (five to ten meters). This is ideally done with datasets that are available globally, which would ultimately allow comparison of different cities. This is done by using a PM2.5 land use regression equation which was developed in the ESCAPE (European Study of Cohorts for Air Pollution Effects) project for the city of London. Also personal exposure to air pollution is examined in this study. The main question of this research is: ‘To what extent is it possible to model air pollution concentrations in cities outside Europe using the London ESCAPE LUR model, how valid is the model and to what extent can the personal exposure of the population to air pollution be modelled?’ The main question is divided in three sub-questions which deal with the input data of the models, the validation and sensitivity analysis of the models and the personal exposure of the population.
This paragraph deals with the methodology of this research. Land use regression models were used to model the relationship between the response variable air pollution and two or more explanatory variables. These explanatory variables in the ESCAPE project were for instance land use, traffic density and topography. The regression equation used in this study consists of two variables: ‘INTMAJORINVDIST’ (‘i’) and ‘ROADLENGTH_500’ (‘l’). The former is the product of the number of cars on the nearest major road and the inverse of distance to this nearest major road. The latter is the total amount of roads within a buffer of 500 meters. The road network data used for this regression equation is OpenStreetMap (OSM, 2015). The number of cars on the roads are estimated by assuming that all registered cars within a city are driving while they are counted. This assumption was needed because traffic intensity data was not available. Two parts of the cities Bangkok and Mexico City were modelled in this research project. Validation and sensitivity analysis were carried out to examine model errors. To examine the exposure of the population, the population numbers of the neighbourhoods were distributed across the modelled areas with a global population density layer grid used as a weighting layer. Then then the personal exposure could be measured.
This paragraph discusses the results of the first question, which is formulated as follows: ‘which input data can be used to compare different cities outside Europe using the model?’ The results have shown that it is possible to model PM2.5 values with the London regression equation. However, a traffic intensity dataset was not available. This is why assumptions needed to be made about the number of cars on a road. The PM2.5 values of the Bangkok area varied from 7.2 to 37066 microgram per cubic meter. For Benito Juárez (Mexico City) this ranged from 13.8 to 183727. The high values for both cities were all located on and near the major roads. This is why another output was generated without these major road locations and a buffer of 10 meter around these major roads. With this output the values range from 7.2 to 38.7 for Bangkok and from 13.8 to 46.3 for Benito Juárez.
The results of the second sub-question are discussed in this paragraph. The question was formulated as follows: ‘to what extent do model errors occur when the model is applied in a study area outside Europe and what causes these errors?’ The input data validation was done by comparing two model outputs with different input data. One model output modelled the air pollution with data used in the ESCAPE project and one model output modelled air pollution with data used in this research project. A direct comparison between the input data was not possible, because the data used in the ESCAPE project was not available as open data. By comparing the two model outputs the effects of using different input data can be analyzed. The input data validation showed a weak positive correlation between the model outputs for Rotterdam of this study and the model output of the ESCAPE project. There was an underestimation of the PM2.5 values, but the model errors were not large. This was concluded based on the small difference between the Root Mean Squared Error and the Mean Absolute Error. The validation of the models for Bangkok and Mexico City showed that the average
4
model outputs were higher than the remote sensing data. This validation also showed that the London regression equation predicted the average PM2.5 values better for Bangkok than for Mexico City. The differences in model output between Bangkok and Mexico City can probably be explained by the distribution of the major roads. The major roads of Benito Juárez are more evenly distributed than the roads in Bangkok. The consequence is that for each grid cell in Benito Juárez a major road is, on average, closer than for each grid cell in Bangkok. This could lead to higher PM2.5 values predicted by the model. The sensitivity analysis showed that the ‘l’ variable had more influence on the model output than the ‘i’ variable.
This paragraph discusses the results of the third sub-question, which was formulated as follows: ‘to what extent is it possible to model the personal exposure of the population to air pollution?’ The exposure results showed that the population in Mexico City was exposed to higher PM2.5 values than the population of Bangkok. The distribution of the population raised the average personal PM2.5 for both populations, because the population was concentrated at locations with higher PM2.5 values, like roads. A comparison of multiple locations at 50 meters from a major road and locations at 200 meters from a major road showed that the differences between these locations vary from 1.6 to 4.4 microgram per cubic metre.
This paragraph deals with the conclusions of this research. With the available global datasets it is possible to model air pollution concentrations to a certain extent. The road network dataset can be used without much data pre-processing. The traffic intensity dataset is the most problematic one and should be obtained locally or be created using assumptions. The model errors are partly explained by the input data. The validation of the input data showed an underestimation of the PM2.5 values in Rotterdam. The absolute errors showed that there were not many large errors in the predictions. This implies that there is a constant underestimation of PM2.5 values in Rotterdam. The model errors are also caused by the model itself. The differences between Bangkok and Mexico City showed that local calibration would be a suitable solution to take city-specific characteristics into account. This will lead to smaller prediction errors. The sensitivity analysis showed that especially the ‘l’ variable has a substantial influence on the model output, which means that it is important to use an accurate and unambiguously mapped road network dataset. The results of the third sub-question have shown that the personal exposure of the population in Bangkok and Mexico City can be estimated with the regression equation. For any location in the research area the PM2.5 can be estimated. However, the air pollution map and the distribution of the population caused some uncertainties.