The role of enhancers in determining cell type-specific gene expression and computational methods to infer enhancer activity
Summary
Enhancers are cis-regulatory elements that play a central role in transcriptional gene regulation and cell-type specific gene expression. More than 95% of variants associated with common diseases identified by GWAS lie in non-coding regions, most of which are thought to be enhancers. Enhancer variants are causally implicated in rheumatoid arthritis, Alzheimer’s, diabetes, and cancer. Hence, it is of crucial importance to have a genome-wide catalog of enhancer elements to better understand health and disease and for targeted therapy design. While markers of enhancer identity such as P300 binding, histone modifications, and sequence motifs have been discovered, no known combination of markers unequivocally identifies enhancers. The problem is made more difficult by the extreme cell-type specificity of enhancers, exemplified by the mere ~6% overlap between enhancers in the VISTA database. We need computational methods to overcome these issues and obtain enhancer catalogs. Over the years, methods have evolved from unsupervised approaches to classical machine-learning, deep learning methods, and now large language model-based approaches, where models are pre-trained on massive amounts of biological sequences and then fine-tuned for enhancer prediction. There has been a gradual improvement in performance, but many challenges remain: high-quality training data of validated enhancers is sparse, cell-type specific prediction is in its infancy while enhancers are highly cell-type specific, experimental validation is complicated and forms a major bottleneck, and methods report different metrics, thresholds, and use different datasets for validation, preventing proper comparison and preventing important methodological improvements. Other reviews of computational methods for enhancer prediction have focused chiefly on specific subsets of these methods or current developments. Here, I take a different perspective and review the chronological evolution of enhancer prediction methods. I show that deep learning-based classifiers have improved predictive performance and that newer generative and pre-trained methods hold great promise, but that the field is limited by i) the lack of standardization in reporting model performance, ii) the lack of experimental validation on genome-wide predictions, and iii) a narrow focus on only one or a few cell types which runs counter to enhancer biology. Finally, I discuss future perspectives informed by the rise of multi-modal foundation models and generative models in the broader ML field
Collections
Related items
Showing items related by title, author, creator and subject.
-
Towards high aspect ratio silica coated gold nanorods capable of smectic ordering for Surface Enhanced Raman spectroscopy
Noordman, Thomas Jan (2023)Raman Spectroscopy is used to investigate the vibrational energy modes of molecules. However, the Raman signals are 1010 till 1015 times weaker than typical fluorescence spectroscopy. Surface Enhanced Raman Scattering ... -
BOOSTING STONY REEFS IN BORKUM STONES
Demmers, Roos (2024)This report evaluates the potential for enhancing nature inclusivity within the GEMS project, in alignment with ONE-Dyas’ commitment to positive regional impact. The focus is on developing nature-inclusive designs for scour ... -
Towards a Better Understanding of Auditory Feedback in Warehouse Management Systems to Improve Usability
Dams, Lode (2024)This study investigates the impact of enhanced auditory feedback on the usability of equipment scanning on handheld devices in Warehouse Management Systems (WMSs). Today, warehouses deal with labor shortages and optimization ...