The role of enhancers in determining cell type-specific gene expression and computational methods to infer enhancer activity

Vivekanandan, Madhupreetha

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Uebbing, Severin
dc.contributor.author	Vivekanandan, Madhupreetha
dc.date.accessioned	2025-08-27T23:01:20Z
dc.date.available	2025-08-27T23:01:20Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/50006
dc.description.abstract	Enhancers are cis-regulatory elements that play a central role in transcriptional gene regulation and cell-type specific gene expression. More than 95% of variants associated with common diseases identified by GWAS lie in non-coding regions, most of which are thought to be enhancers. Enhancer variants are causally implicated in rheumatoid arthritis, Alzheimer’s, diabetes, and cancer. Hence, it is of crucial importance to have a genome-wide catalog of enhancer elements to better understand health and disease and for targeted therapy design. While markers of enhancer identity such as P300 binding, histone modifications, and sequence motifs have been discovered, no known combination of markers unequivocally identifies enhancers. The problem is made more difficult by the extreme cell-type specificity of enhancers, exemplified by the mere ~6% overlap between enhancers in the VISTA database. We need computational methods to overcome these issues and obtain enhancer catalogs. Over the years, methods have evolved from unsupervised approaches to classical machine-learning, deep learning methods, and now large language model-based approaches, where models are pre-trained on massive amounts of biological sequences and then fine-tuned for enhancer prediction. There has been a gradual improvement in performance, but many challenges remain: high-quality training data of validated enhancers is sparse, cell-type specific prediction is in its infancy while enhancers are highly cell-type specific, experimental validation is complicated and forms a major bottleneck, and methods report different metrics, thresholds, and use different datasets for validation, preventing proper comparison and preventing important methodological improvements. Other reviews of computational methods for enhancer prediction have focused chiefly on specific subsets of these methods or current developments. Here, I take a different perspective and review the chronological evolution of enhancer prediction methods. I show that deep learning-based classifiers have improved predictive performance and that newer generative and pre-trained methods hold great promise, but that the field is limited by i) the lack of standardization in reporting model performance, ii) the lack of experimental validation on genome-wide predictions, and iii) a narrow focus on only one or a few cell types which runs counter to enhancer biology. Finally, I discuss future perspectives informed by the rise of multi-modal foundation models and generative models in the broader ML field
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Enhancers are cis-regulatory elements that control gene expression in a cell-type specific manner. Enhancer variants are linked to diseases like cancer and diabetes, which makes it important to identify and characterise them This review explores computational methods for predicting enhancer elements in the genome, tracing their evolution from classic machine learning to deep learning and finally the use of large language models.
dc.title	The role of enhancers in determining cell type-specific gene expression and computational methods to infer enhancer activity
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Enhancer, enhancer prediction, machine learning, deep learning, large language models
dc.subject.courseuu	Bioinformatics and Biocomplexity
dc.thesis.id	40135

Files in this item

Name:: writing_assignment_final_draft.pdf
Size:: 451.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

The role of enhancers in determining cell type-specific gene expression and computational methods to infer enhancer activity

Files in this item

This item appears in the following Collection(s)

Related items

Towards high aspect ratio silica coated gold nanorods capable of smectic ordering for Surface Enhanced Raman spectroscopy ﻿

BOOSTING STONY REEFS IN BORKUM STONES ﻿

Towards a Better Understanding of Auditory Feedback in Warehouse Management Systems to Improve Usability ﻿

Towards high aspect ratio silica coated gold nanorods capable of smectic ordering for Surface Enhanced Raman spectroscopy

BOOSTING STONY REEFS IN BORKUM STONES

Towards a Better Understanding of Auditory Feedback in Warehouse Management Systems to Improve Usability