A transfer learning approach to predict gene expression for new cell types
Summary
Deep learning models have been developed to predict gene expression, chromatin accessibility, and other genomic features directly from DNA sequence. These models are used to decipher genomic regulatory functions and to predict the regulatory effect of (non-)coding variants. Enformer is the state-of-the-art sequence-based deep learning model and predicts a variety of genomic profiles for human and mouse, including chromatin accessibility, transcription factor binding, and histone modifications. Enformer has achieved remarkable results compared to other sequence-based models, however is limited to predicting the profiles it has been trained on. Currently, there is no method to integrate genomic profiles for new cell types into Enformer besides complete retraining of Enformer, which is not possible on all computing clusters considering the required computational resources. Here, we aim to explore a transfer learning approach, where we use Enformer as a pretrained model, and finetune our human head model on new genomic profiles. We increase the resolution of the genomic profiles learned by our approach from tissue to cell-type specific, as there is currently no method that predicts genomic profiles from solely DNA sequence on the cell-type specific level. We show that we can finetune the human head model on genomic profiles from new cell types using Enformer as the pretrained model, and that the human head model can learn cell-type specific differences in chromatin accessibility, compared to Enformer which predicts tissue-specific chromatin accessibility.