Computational methods for the detection of structural variation in the human genome
Summary
Structural variations are genomic rearrangements that contribute significantly to evolution, natural variation between humans, and are often involved in genetic disorders. Cellular stresses and errors in repair mechanisms can lead to a large variety of structural variation events throughout the genome. Traditional microscopy- and array-based methods are used for the detection of larger events or copy number variations. Next generation sequencing has in theory enabled the detection of all types of structural variants in the human genome at unprecedented accuracy. In practice, a significant challenge lies in the development of computational methods that are able to identify these structural variants based on the generated data. In the last several years, many tools have been developed based on four different categories of information that can be obtained from sequencing experiments: read pairs, read depths, split reads and assembled sequences.
In this thesis, I first introduce the topic of structural variation by discussing its impact in various areas, what mechanisms can lead to its formation, and the types of structural variation that can occur. Subsequently, I describe the array-based and sequencing-based methods that can be used to detect structural variation. Finally, I give an overview of the tools that are currently available to detect signatures of structural variants in NGS data and their properties, and conclude by discussing the current capabilities of these tools, possible future directions and expectations for the future