Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorUebbing, Severin
dc.contributor.authorVivekanandan, Madhupreetha
dc.date.accessioned2025-08-27T23:01:20Z
dc.date.available2025-08-27T23:01:20Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50006
dc.description.abstractEnhancers are cis-regulatory elements that play a central role in transcriptional gene regulation and cell-type specific gene expression. More than 95% of variants associated with common diseases identified by GWAS lie in non-coding regions, most of which are thought to be enhancers. Enhancer variants are causally implicated in rheumatoid arthritis, Alzheimer’s, diabetes, and cancer. Hence, it is of crucial importance to have a genome-wide catalog of enhancer elements to better understand health and disease and for targeted therapy design. While markers of enhancer identity such as P300 binding, histone modifications, and sequence motifs have been discovered, no known combination of markers unequivocally identifies enhancers. The problem is made more difficult by the extreme cell-type specificity of enhancers, exemplified by the mere ~6% overlap between enhancers in the VISTA database. We need computational methods to overcome these issues and obtain enhancer catalogs. Over the years, methods have evolved from unsupervised approaches to classical machine-learning, deep learning methods, and now large language model-based approaches, where models are pre-trained on massive amounts of biological sequences and then fine-tuned for enhancer prediction. There has been a gradual improvement in performance, but many challenges remain: high-quality training data of validated enhancers is sparse, cell-type specific prediction is in its infancy while enhancers are highly cell-type specific, experimental validation is complicated and forms a major bottleneck, and methods report different metrics, thresholds, and use different datasets for validation, preventing proper comparison and preventing important methodological improvements. Other reviews of computational methods for enhancer prediction have focused chiefly on specific subsets of these methods or current developments. Here, I take a different perspective and review the chronological evolution of enhancer prediction methods. I show that deep learning-based classifiers have improved predictive performance and that newer generative and pre-trained methods hold great promise, but that the field is limited by i) the lack of standardization in reporting model performance, ii) the lack of experimental validation on genome-wide predictions, and iii) a narrow focus on only one or a few cell types which runs counter to enhancer biology. Finally, I discuss future perspectives informed by the rise of multi-modal foundation models and generative models in the broader ML field
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectEnhancers are cis-regulatory elements that control gene expression in a cell-type specific manner. Enhancer variants are linked to diseases like cancer and diabetes, which makes it important to identify and characterise them This review explores computational methods for predicting enhancer elements in the genome, tracing their evolution from classic machine learning to deep learning and finally the use of large language models.
dc.titleThe role of enhancers in determining cell type-specific gene expression and computational methods to infer enhancer activity
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsEnhancer, enhancer prediction, machine learning, deep learning, large language models
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id40135


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record