Consensus Calling and Validation of Single Nucleotide Variant Calling from Nanopore Sequencing with Deep Learning for CyclomicsSeq
Summary
Cell-free DNA (cfDNA) are small (145 bp) DNA fragments that reside in the human circulation and other bodily fluids. cfDNA is derived from both healthy and tumour cells, in which case it is called circulating tumour DNA (ctDNA). ctDNA harbours genetic and epigenetic characteristics from the tumour genome and is therefore a valuable source of information. Cancer liquid biopsies target these fragments, but as the ctDNA concentration in the blood can be very low, sensitive sequencing technologies are needed. CyclomicsSeq is a novel sequencing technique that uses rolling circle amplification to generate multiple copies of a cfDNA fragment in a long concatemer. They are sequenced with Nanopore sequencing and the consensus sequences for all cfDNA fragments can be used to detect tumour variants and to infer the tumour fraction. The sequencing error rate for Nanopore sequencing is still quite high, but with consensus calling of the multiple cfDNA copies, random sequencing errors are removed. However, systematic sequencing errors remain even with consensus calling. Here, we present a deep learning model that is trained on Nanopore sequencing data. The model can perform accurate consensus calling for CyclomicsSeq and can find tumour variants in ctDNA fragments at a low variant allele frequency.