IDENTIFYING STRUCTURAL VARIATIONS USING OPTIMIZED SV PIPELINE
Summary
Structural variants (SVs) are genomic alterations of at least 50 base pairs in the DNA. The PrediCT
study utilizes two gene panels to investigate tumour development linked to germline mutations in
cancer predisposition genes. The aim of this project is to optimize an SV pipeline and identify if
there are clinically significant SVs, focussing on deletions, in genes from the gene panels in PrediCT
patients.
SV callers identify genomic alterations in the DNA and produce a VCF file. However, currently there
is not a single caller good enough for accurate and comprehensive detection of SVs (Koboldt, 2020)
(Kuzniar et al, 2020) (Kosugi et al, 2019). Hence, SV callers Manta and Dysgu are combined for a
more accurate and comprehensive detection of SVs. Both callers contain a property in their VCF file
useful for validating the accuracy of the event and can process CRAM files, therefore significantly
reducing the pipeline’s runtime.
The pipeline is optimized by setting thresholds for Dysgu’s Probability Score- and Manta’s Quality
Score property, based on event verification in IGV (Robinson et al, 2011), which serve as filter. SV
length and high-confidence-calls from a single caller are also filtered in. Events of interest are
annotated using AnnotSV, annotation software specialized for SVs, and exon-region filtering is
performed. Many identified SVs lack clinical significance due to being population-common.
Combining Dysgu and Manta showed improved results over the use of them as a single caller. With
an equivalent number of correctly identified events, fewer false positives are called when combing
the callers.
In the analysis, the pipeline was unable to detect any novel clinically significant SVs beyond those
that were already established.