Improving image recognition for species identification by modeling ecological context
Summary
Fine-grained image recognition can be used to identify species from images on the (sub)species level. One
of the key challenges for improving the accuracy of species identification models are geographical bias and
class imbalance: some species and some areas are overrepresented in the training data. Providing a model
with contextual information such as location coordinates, date, environmental variables and neighboring
species may help to overcome these problems by creating context-aware predictions.
We combined 22 million images of 31 thousand species with information on location and date of observation, habitat variables and neighboring observations to train a new context-aware model. We employed a transformer architecture that enriches the image representation created by a convolutional neural network, using information from 800 nearby species. Transforming image representations using neighbouring observations is a novel approach to modeling ecological context. This model was compared with a baseline image-only model and ablation models, using existing and new metrics that measure how well the model is able to deal with data biases.
The new context-aware model showed a significant performance improvement on all metrics. The overall
accuracy improvement was 1.5 percent point, reducing the error rate by 9.5 percent. Enriching the image
representations using a transformer architecture improved the model for most taxonomic groups. Species
with few observation records profited more strongly from including contextual ecological information than
species with many observations. Rare species that are only present locally could be correctly identified
because the model had access to contextual information about the local ecology. Areas with few data points profited more from the new model than areas with a lot of data. The local accuracy in different areas became more equally distributed.
In summary, the new model was better able to deal with geographical bias and class imbalance in the data. Image recognition for species identification thus profits from including contextual ecological information in the model, either as direct input or as a means to transform image representations.