Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKryven, I.V.
dc.contributor.authorSlegers, Fleur
dc.date.accessioned2023-03-25T01:00:57Z
dc.date.available2023-03-25T01:00:57Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/43719
dc.description.abstractWastewater is a reservoir of human excretion and contains virus particles shed by people. Its analysis can provide information on the prevalence of infectious diseases. In the Netherlands, sewage surveillance has been used as a tool for monitoring the COVID-19 pandemic by detecting and measuring SARS-CoV-2 virus particles in sewage at over 300 sewage treatment plants (STPs) multiple times a week. The result thereof is an extensive data set of multivariate time series. In this thesis, linear regression models are used to model and forecast virus load time series for each variable (STP). We compare vector autoregressive (VAR) models which were enriched with different variable selection methods, based on K-Nearest Neighbours, correlations between time series and principal component analysis. How much the inclusion of multiple variables improves predictions is strongly dependent on the number and choice of variables, on the smoothness of the time series, on the length of the training set and on time between training and testing. A remarkable result from this research is that intermediate number of variables used in the models resulted in largest test errors. We found that performance of the models was worse for STPs with small catchment areas. With this research, we shed light on how relationships between STPs can be incorporated in multivariate linear time series models and introduce three novel variable selection methods for VAR.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectIn this thesis, autoregression models for time series data are used to model and forecast SARS-CoV-2 virus loads in sewage. Standard vector autoregression models are extended with feature selection methods, more specifically K-nearest neighbors, correlation based methods and principal component analysis.
dc.titleForecasting SARS-CoV-2 Virus Load in Sewage Using Autoregression Models for Time Series Data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuMathematical Sciences
dc.thesis.id15273


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record