gplasCC: a robust plasmid reconstruction tool using short-read data from any bacterial species
Summary
Plasmids play a pivotal role in the spread of antibiotic resistance genes (ARGs). Accurately reconstructing plasmids often requires long-read sequencing, which is more expensive and still more error-prone than short-read sequencing. We present an optimized approach for reconstructing antimicrobial resistance plasmids using only Illumina short-read data. This method combines a robust binary classification tool (plasmidCC), with a plasmid reconstruction tool (gplasCC) that makes use of features from the assembly graph to reliably bin predicted plasmid contigs into individual plasmids. PlasmidCC, predicts plasmid-derived contigs based on custom Centrifuge databases. Here we present eight plasmidCC databases: seven species-specific models (Acinetobacter baumannii, Escherichia coli, Enterococcus faecalis, Enterococcus faecium, Klebsiella pneumoniae, Staphylococcus aureus, and Salmonella enterica) in addition to one species-independent model for less frequently studied bacterial species. We integrate the plasmid classification from plasmidCC into the plasmid reconstruction workflow from gplasCC to accurately reconstruct ARG plasmids from more than 100 bacterial species.