Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSchürch, Anita
dc.contributor.authorBayraktar, Doğukan
dc.date.accessioned2024-12-31T00:01:36Z
dc.date.available2024-12-31T00:01:36Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48277
dc.description.abstractAntimicrobial resistance (AMR) is a critical global health threat, driven by bacteria evolving to withstand existing treatments. Addressing AMR requires understanding the spread of antimicrobial resistance genes (ARGs) through extensive public microbial sequencing data. However, metadata inconsistencies in public databases such as the NCBI Sequence Read Archive (SRA) hinder this process. Because of the variable quality of the metadata, the data itself is difficult to find and reuse. We developed SRA-Data-Collector, a Snakemake pipeline that streamlines the search and retrieval of NCBI SRA data using taxon IDs, run-accessions, and study-accessions. This tool simplifies finding samples, collects, cleans, formats metadata, and supports parallel sample downloads. We also adapted a reference-free plasmid reconstruction pipeline for use with the collected Illumina short-read data. To demonstrate the tool’s utility, we downloaded samples of Shigella flexneri and Enterobacter cloacae with it. We then reconstructed plasmids with the plasmid reconstruction pipeline, and clustered the plasmids with mge-cluster. Preliminary analyses where we annotated the clusters with the metadata from SRA-Data-Collector showed that plasmid clusters were not influenced by geolocation, source, or collection year. While we have demonstrated how SRA-Data-Collector and automated plasmid reconstruction enable bulk analysis of public sequencing data. It is important to stress that thorough curating and analysis of mge-clusters is needed, and that only annotating clusters with metadata is not sufficient. To conclude, our integrated approach using SRA-Data-Collector and associated tools facilitate ARG analysis with public data, supporting global-scale AMR research.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectDevelopment of a tool for retrieving data and metadata from NCBI SRA. Building of snakemake pipelines for reconstructing plasmids from NCBI SRA data.
dc.titleReconstructing NCBI SRA plasmids for global analysis of antimicrobial resistance genes
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsAntimicrobial resistance;plasmids;metadata;software;snakemake;SRA;AMR;NCBI;SRA-Data-Collector;plasmid reconstruction
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id31981


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record