Estimating Post-Earthquake Aid Priority Areas
Summary
In the first days following a disaster, humanitarian decision makers often deal with a scarcity of
information on the spatial aspects of the event’s impact, and thus the need for humanitarian aid of the
affected population. By learning from data of past events Priority Index Models (PIM’s) can rapidly
produce an estimate of a disaster’s impact, which can help decision makers to identify aid priority areas.
This enables empirically-based decision support, in contrast to the more subjective models that are
currently used. The main objective of this study is to explore the usability of pre- and post-event open
data to train a model to rapidly estimate post-earthquake aid neediness for any earthquake prone area
on earth. As far as known, machine learning algorithms have not been applied before to predict aid
priority areas after seismic hazards specifically. To achieve the research objective the Gorkha
earthquake of 2015 in Nepal was used as a test case. Country- and hazard-specific open data related to
this earthquake were used to predict aid-neediness. Damage to residential buildings was select as the
most suitable aid-neediness indicating variable. Three different statistical models were fitted to the
data: a multivariate linear regression model and two random forest regression models (one predicting
completely damaged houses and the other predicting a combination of completely and partially
damaged houses). 24 variables in four different categories (hazard, exposure, physical vulnerability and
socio-economic vulnerability) were identified as predictors of post-earthquake structural damage. All
three models could successfully produce an output on administrative level 4 (VDC) for the 16 most
affected districts. Statistically, the random forest model predicting bot partially and completely
damaged houses performed best with an R-squared of 0.63 on an independent test dataset. However,
the random forest model predicting only completely damaged is favourable because the output is more
intuitive and extendable. Also, the R-squared is not much lower with 0.60 and two-third of the highest
priority areas were identified correctly. The linear model prediction resulted in an R-squared of 0.53.
Additionally, this model’s output gave reason to suspect that the identified relationship between ‘school
attendance’, ‘toilet presence’ and ‘foundation type’ and damage might not be applicable to other events
or countries. The mean Macroseismic intensity and total population were most important in all models
and are considered to be indispensable model components. For a future event within Nepal a model
output of similar accuracy can be expected, but the presence of case- and country-specific relationships
in the current model makes a useful estimation for a future event in another country very unlikely.
However, after training the model on events in different countries the model is expected to be able to
produce an output that is useful for aid prioritisation decision making. The extent to which the model
can be successfully applied to different countries and cases can be improved by excluding secondary
hazard susceptibility variables, finding an alternative uniform socio-economic vulnerability variable and
using composite building quality variables. Model simplicity and data preparedness are key aspects in
the successful further development of these models.