A review on deep learning for regulatory genomics
Summary
Advances in deep learning have revolutionized the omics field, including genomics, epigenomics and transcriptomics. Many deep learning models have integrated multiple types of omics data to study genomic regulation and predict different signals of regulatory activity from DNA sequence. These models differ from each other in many aspects, such as the training data, the model architecture, the training approach, or their interpretation method. In this review, we provide a comprehensive overview of the current state of the field of deep learning in regulatory genomics by examining each part of these models. We start by describing the differences in the data used by each model and then explain the most commonly used architectures and the different training approaches these models take. We also provide a concise overview of the
different model interpretation methods available with their advantages and disadvantages. Furthermore, three main applications of these models are described: motif discovery, non-coding variant effect and synthetic construct design. Finally, we conclude with a discussion of the limitations of these models nowadays. This survey is intended to serve as a guideline for omics researchers to gain an overview of the current landscape of deep learning methods in genomics and to guide them to focus new efforts on solving the limitations.