Detecting (novel) transposons in metagenomic data using custom Hidden Markov Models
Summary
Transposons, like for example plasmids, are a type of mobile genetic element (MGE). Transposons are DNA sequences that can relocate to other positions in one or multiple genomes using a transposase gene. Most transposons carry additional passengers genes that are not necessary for relocation, such as antibiotic resistance genes. These passenger genes can spread in bacterial populations via transposons, exacerbating the global antibiotic resistance problem. To better understand the mechanics behind transposons and their spread through populations we could make use of metagenomics. While some bioinformatical tools already exist to detect transposons in data, to our knowledge no known tool can detect (novel) transposon sequences in metagenomic data. Our proposed method for a tool that is able to do this is based on pHMMs from protein domains of transposons that were extracted from the MGEdb. We tested our proposed models using the original database they are based on and by creating a simulated metagenomic dataset from 8 well-annotated genomes and 3 plasmids. The models were able to detect the vast majority of transposons included in the database, and the results from the metagenomic dataset reveal suspect transposons in the data, possibly linked to Tn4652 and Tn7.