Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorWillems, R.J.L.
dc.contributor.authorVader, Lisa
dc.date.accessioned2022-06-15T00:00:43Z
dc.date.available2022-06-15T00:00:43Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/41637
dc.description.abstractOver the past decades, pathogenic lineages of Escherichia coli have rapidly acquired antibiotic resistance. Currently, multidrug resistant E. coli is the most frequent cause of lethal infections among resistant bacteria in a hospital setting. Antibiotic resistance genes (ARGs) are commonly spread via plasmids. From a clinical and epidemiological standpoint, it is very relevant to analyse the plasmid content in E. coli. The rise of Illumina whole genome sequencing (WGS) has enabled fast large-scale analysis of the genomic content of bacteria. However, it is usually not possible to reconstruct plasmids by genome assembly of short-read sequencing data. Therefore, several bioinformatic tools have been developed to uncover the total plasmid content in a sample, also referred to as the plasmidome, by classifying genomic sequences as either chromosome- or plasmid- derived. We benchmarked four of these binary classifiers (mlplasmids, PlaScope, Platon and RFPlasmids). They are at the basis of plasmidEC, an ensemble classifier that combines the output of three plasmid classifiers using a majority voting system. The combination of Platon/PlaScope/RFPlasmid presented the best plasmidome predictions (F1-score = 0.904). Compared to individual classifiers, plasmidEC achieved increased recall (0.885), especially for contigs derived from ARG-plasmids (recall = 0.941). Moreover, a plasmidome study of E. coli ST131 using plasmidEC was used to identify differences between this lineage and other E. coli. Finally, we show that plasmidEC removes chromosomal contamination in plasmid reconstructions obtained by MOB- suite.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectAntibiotic resistance genes (ARGs) are frequently carried on plasmids. We compared the performance of four bioinformatic tools that predict plasmid sequences in E. coli. We developed plasmidEC, an ensemble classifier that combines the predictions of these tools. PlasmidEC achieves improved recall, especially for plasmids that carry ARGs. We also show two applications of this tool; plasmidome analysis of E. coli ST131, and removal of contamination in reconstructed plasmids.
dc.titlePlasmidEC: An ensemble of classifiers that improves plasmidome recall from short-read sequencing data in Escherichia coli
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsplasmids;antibiotic resistance;E. coli;plasmidome;bioinformatics
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id4468


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record