Identifying and improving quality issues in Google Semantic Location History DDPs for public transport activities
Summary
More and more human life takes place online, resulting in an increasing role of digital privacy in society. New laws are created to protect people’s privacy. As a response to these laws, companies now give their users the
opportunity to download their personal data as Data Download Packages (DDPs). A recent study used the Google Semantic Location History DDPs to investigate how the COVID-19 pandemic changed travel behaviour.
However, these DDP suffer from potential quality issues, influencing the data quality and inferences made on these data. The aim of this project is to identify these potential quality issues, take them into account with data
imputation where possible, and see if this makes a difference. This thesis will focus on errors in public transport activity types found in Google Semantic Location History.
A Python script will check if different parts of the data meet set requirements to locate the quality issues. This script will count the number of errors and use data imputation where possible, resulting in a more accurate
data extraction. This, in turn, leads to a better understanding of travel behaviours. While multiple steps are still needed to make the extraction as accurate to reality as possible, this is a first step towards improving the accuracy of inferences with Google Semantic Location History data.