PlasmidEC: An ensemble of classifiers that improves plasmidome recall from short-read sequencing data in Escherichia coli
Summary
Over the past decades, pathogenic lineages of Escherichia coli have rapidly acquired antibiotic
resistance. Currently, multidrug resistant E. coli is the most frequent cause of lethal infections
among resistant bacteria in a hospital setting. Antibiotic resistance genes (ARGs) are commonly
spread via plasmids. From a clinical and epidemiological standpoint, it is very relevant to analyse the
plasmid content in E. coli. The rise of Illumina whole genome sequencing (WGS) has enabled fast
large-scale analysis of the genomic content of bacteria. However, it is usually not possible to
reconstruct plasmids by genome assembly of short-read sequencing data. Therefore, several
bioinformatic tools have been developed to uncover the total plasmid content in a sample, also
referred to as the plasmidome, by classifying genomic sequences as either chromosome- or plasmid-
derived. We benchmarked four of these binary classifiers (mlplasmids, PlaScope, Platon and
RFPlasmids). They are at the basis of plasmidEC, an ensemble classifier that combines the output of
three plasmid classifiers using a majority voting system. The combination of
Platon/PlaScope/RFPlasmid presented the best plasmidome predictions (F1-score = 0.904).
Compared to individual classifiers, plasmidEC achieved increased recall (0.885), especially for contigs
derived from ARG-plasmids (recall = 0.941). Moreover, a plasmidome study of E. coli ST131 using
plasmidEC was used to identify differences between this lineage and other E. coli. Finally, we show
that plasmidEC removes chromosomal contamination in plasmid reconstructions obtained by MOB-
suite.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Random Forests for Plasmid Detection - An Exercise in Model Building and Evaluation
Gilliquet, Ethel (2022)Plasmids are bacterial genetic elements that are replicated and transferred independently from the chromosome. Because of their independent mechanisms of replication and transfer, the study of plasmids is of special interest ... -
Determining plasmids from short read sequences
Hein, Y.W.R. (2018)In this thesis we present a novel method to determine the DNA-sequence of plasmids from a De Bruijn Graph. The method uses coverage data as well as estimates, based on the program mlplasmids, how likely a certain contig ... -
gplasCC: a robust plasmid reconstruction tool using short-read data from any bacterial species
Jordan, Oscar (2024)Plasmids play a pivotal role in the spread of antibiotic resistance genes (ARGs). Accurately reconstructing plasmids often requires long-read sequencing, which is more expensive and still more error-prone than short-read ...