Coalescent theory and infectious diseases
Summary
The hereditary information of organisms is carried by DNA molecules. The DNA molecules can be extracted
from a cell and its genetic code can be read. Within a population differences in the genetic code can occur
between individuals. These differences are caused by mutations of the genes on the DNA molecules. The
number of differences between to individuals is a measure for the relatedness of those two individuals. When
the genetic information of a group of individuals is sampled, we can construct a kind of family tree, a socalled
phylogenetic tree, using the differences between the individuals.
The coalescent theory makes a link between the phylogenetic tree and the population dynamics. It describes
a model for the reproduction of the individuals to explain the phylogenetic tree. With this theory we can
use the phylogenetic tree to estimate the size of the total population. This is the main reason why we want
to try to apply coalescent theory to describe the spread of an infectious disease through a population.
Some problems, with the assumptions made in the classical coalescent model, arise when we use it for
pathogens causing an infectious disease. In this model one individual is an infected host and the total population
size, which we want to estimate, is the total number of infected hosts.
In the classical coalescent theory, among other things, there is assumed that the total population is very
large and the number of sampled individuals is relative small. In the setting of infectious diseases, this
assumptions can be problematic because it is possible that the total number of infected hosts is relative
small, while a mayor part of the infected individuals is sampled.
In my thesis I describe two different extensions of the classical model to deal with relative large samples.
From this extensions I derive two estimators for the population size and explore their performances.