Causal Inference on Multimodal Single-Cell Data
Summary
The flow of genetic information within a cell is a complex interaction between DNA, RNA, and proteins. Understanding this interplay amounts to understanding how genetic code manifests itself into cellular function, it is therefore a critical part of understanding the mechanisms of life itself. Recent technological advances have made it possible to simultaneously measure the expression of more than one of these three modalities in a single cell, thereby paving the way for machine learning to attempt the uncovering of these interactions. Algorithms for causal discovery build a graphical model to represent the causal relationships between the variables in a system. However, these methods are usually ill-suited to handle the large amount of variables and the cyclical causal relationships that are inherent to the problem at hand. Here, a two-tiered approach is proposed that first assigns the variables to partially overlapping clusters and subsequently uses an adapted interpretation of the FCI algorithm, that can handle cyclical causality, on each of the individual clusters. We investigate whether this approach can construct a collection of causal graphs that reasonably models the observed data by checking its consistency among overlapping clusters. This consistency holds up for causal graphs based on the same batch of experimental data, but not across data from distinct experiments; leading to the conclusion that this two-tiered approach opens up promising avenues to inferring cyclical causality in high-dimensional processes, but, in its current form, is yet unable to expose the intricacies of intracellular dynamics. Overall, this work can be viewed as a stepping stone towards the ultimate goal of combining all the cluster-specific graphs and creating a single high-dimensional cyclical causal graph that describes the flow of genetic information in a cell.