Forecasting SARS-CoV-2 Virus Load in Sewage Using Autoregression Models for Time Series Data
MetadataShow full item record
Wastewater is a reservoir of human excretion and contains virus particles shed by people. Its analysis can provide information on the prevalence of infectious diseases. In the Netherlands, sewage surveillance has been used as a tool for monitoring the COVID-19 pandemic by detecting and measuring SARS-CoV-2 virus particles in sewage at over 300 sewage treatment plants (STPs) multiple times a week. The result thereof is an extensive data set of multivariate time series. In this thesis, linear regression models are used to model and forecast virus load time series for each variable (STP). We compare vector autoregressive (VAR) models which were enriched with different variable selection methods, based on K-Nearest Neighbours, correlations between time series and principal component analysis. How much the inclusion of multiple variables improves predictions is strongly dependent on the number and choice of variables, on the smoothness of the time series, on the length of the training set and on time between training and testing. A remarkable result from this research is that intermediate number of variables used in the models resulted in largest test errors. We found that performance of the models was worse for STPs with small catchment areas. With this research, we shed light on how relationships between STPs can be incorporated in multivariate linear time series models and introduce three novel variable selection methods for VAR.