Using Land-use Data to Improve Automatic Classification Accuracy of Machine Learning Models for Classifying Outdoor Sport Activities in GNSS-Tracks
Summary
Getting sufficient amounts of physical activity are widely understood to improve general health and
wellbeing. Understanding the patterns in sport-behaviour and its connection to land-use elements
are vital for promoting physical activity and meeting global health goals set by the World Health
Organisation. Collecting and analysing data on physical activity can help this understanding.
Nowadays, nearly everyone collects spatial data in the form of GNSS tracks through their smartdevices. This data can be used to detect physical activities. However, raw spatial data lacks context
and requires analysis, which can be time-consuming. For this purpose, various machine learning
models were trained in this research that can automatically classify sport activities performed in
GNSS tracks. Pre-labelled GNSS-tracks were used to train and test the models. Land-use data that
corresponded with the GNSS tracks was also used to find out to what extend it could influence the
models’ classification accuracy. The model trained with the support vector machines’ algorithm
achieved the highest classification accuracy with a classification accuracy of 82.6%. Adding land-use
data to the model also significantly increased its classification accuracy (+5.6%). Using land-use data
in other machine learning algorithms also significantly improved their models’ performance.
However, in these models, not all land-use features were found to have a positive influence on the
models’ performance.