dc.description.abstract | Nowadays, in a research hospital such as the Princess Máxima Centre (PMC), research and patient treatment is often substantiated on NGS data. Therefore, quality control of patient data is vital to preserve data integrity. However, several steps of the process from patient to genotype are vulnerable to sample swaps. For this purpose, NGSCheckMate was presented, a tool which retrospectively checks whether samples are labelled correctly based on a set of 21K SNPs. Nevertheless, running NGSCheckMate utilising the original 21K SNP set was found to be computationally inefficient in the PMC, with runtimes of patient samples adding up to ~68 hours. Moreover, data coming out of the PMC biobank sequencing pipeline was observed not to be compatible with NGSCheckMate as no integration of RNA-Seq with W[GX]S was achieved, even though samples were obtained from the same biomaterial. By selection of SNPs based on variety in minor allele and coverage across RNA-Seq samples, smaller SNP sets were created that maintained and improved performance compared to the original 21K set. Total runtime of NGSCheckMate was decreased from ~68 to ~2 hours. Furthermore, in combination with pre-processing and additional filtering of low-quality files, RNA-Seq integration was improved. In conclusion, this study presents a range of smaller SNP sets that both decrease runtime and improve performance of NGSCheckMate in sample swap detection. | |