dc.description.abstract | To date, studies on non-coding regions of the genome, specifically in cancer, have been limited.
This is mainly due to the complex nature of putative functional elements in these regions. In
parallel with the ENCODE-project, the interest in these regions has increased: researchers are
beginning to study causal non-coding variations in cancer1;2. Due to the increase in popularity
and cost-effectiveness of various omics-approaches, more and more data is becoming available.
The complexity of integrating and analysing information of these approaches increases with every
added omics-layer or dimension (e.g. time-series, treatments). When studying the effects of
structural variants in non-coding regions in cancer, this complexity is further increased due
to cancer-specific (e.g. heterogeneous samples, rapid evolution) and the multiple types and
consequences of different structural variant-specific factors.
The current methods for integrating and analysing these layers and dimensions have two significant
limitations in their design: scalability and generality (i.e. the possibility to add more
levels or dimensions). Moreover, there isn’t an option to overview a dataset without filtering,
dividing or restructuring the data. The integration of complex datasets is needed to understand
the complex biology of cancer better 3, but is restricted by these limitations.
Enter the Semantic Web and its Resource Description Framework (RDF). A simple and flexible
framework for describing anything about anything. Since every type of data can be translated to
this universal language, integration of large datasets of different levels and dimensions becomes
possible and a lot more feasible. When researchers have converted their local data to RDF, they
can easily connect and combine it with public repositories, which makes analyses even more
powerful. By using the SPARQL Protocol and RDF Query Language (SPARQL), retrieving
and manipulating data in RDF is easily readable by both humans and computers. The user can
subsequently visualise the SPARQL-results as a whole or filter them further.
Here, we propose the use of semantic web technologies and visual analytics to decrease the
complexity of integrating and visualizing multi-level and -dimensional biological data. These
methods will enable further elucidation of the complex biology of, for example, cancer. Firstly,
we will create the framework needed to design the missing tools for converting the most-used
NGS-formats to RDF. Next, visualisations (based on visual analytics) of the biological RDF-data
will be created, which will be used to perform previously impossible integration-focussed analyses
on the consequences of structural variation in the non-coding regions of cancer-genomes. | |