Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKenna, Kevin
dc.contributor.authorBabukhian, Miriam
dc.date.accessioned2025-02-27T00:02:10Z
dc.date.available2025-02-27T00:02:10Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48558
dc.description.abstract5’ Untranslated Regions (UTRs) play a pivotal role in post-transcriptional regulation of gene expression, however, investigating these non-coding regions poses significant challenges. The advent of deep learning methods can help unveiling the complex nature of regulatory elements, including that of the 5’UTR. In this study we propose an explainable deep-learning-based approach to uncover functionally significant features in 5’UTRs from existing long-read RNA-seq data. We trained two models, namely DNABert-2 and a Convolutional Neural Network (CNN) to tackle two distinct classification tasks: discriminating brain-specific 5’UTRs from those of other tissues and distinguishing general 5’UTRs from randomly selected sequences of the genome. We attempted to interpret the models decision-making process by performing in silico mutagenesis (ISM) and visualizing DNABert-2’s attention scores. Our results showed that none of the models were able to successfully discriminate between brain-specific 5’UTRs and 5’UTRs from other tissues, while both DNABert-2 and the CNN showed excellent performance in correctly classifying general 5’UTRs against random sequences of the genome. The CNN was heavily relying on the GC-content of the sequence to predict class probability, while DNABert-2 was able to pick up more complex cues within the 5’UTR. We were eventually able to partially interpret DNABert-2’s predictions and uncover functionally known 5’UTR motifs, however, further model training and interpretation are needed to support the validity of our findings.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe aim of this study was to unveil hidden regulatory elements within the 5' Untranslated Region (UTR). To this end, we proposed an explainable deep-learning-based approach to uncover functionally significant features in 5’UTRs from existing long-read RNA-seq data. Two different models, namely DNABert-2 and a Convolutional Neural Network (CNN, were used in the analysis.
dc.titleUnveiling Cryptic Regulatory Elements in 5’UTRs with DNABert-2: A Comparative Analysis with CNN Models
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywords5’Untranslated Region; DNABert-2; Convolutional Neural Networks; Explainable AI
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id29560


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record