Everything should be linked: linking and visualising data for dynamic multidimensional biological data interpretation.

Weide, R.H.W.E. van der

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	de Ligt, J.
dc.contributor.advisor	Cuppen, E.
dc.contributor.advisor	Wessels, L.
dc.contributor.author	Weide, R.H.W.E. van der
dc.date.accessioned	2015-07-16T17:00:38Z
dc.date.available	2015-07-16T17:00:38Z
dc.date.issued	2015
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/20375
dc.description.abstract	To date, studies on non-coding regions of the genome, specifically in cancer, have been limited. This is mainly due to the complex nature of putative functional elements in these regions. In parallel with the ENCODE-project, the interest in these regions has increased: researchers are beginning to study causal non-coding variations in cancer1;2. Due to the increase in popularity and cost-effectiveness of various omics-approaches, more and more data is becoming available. The complexity of integrating and analysing information of these approaches increases with every added omics-layer or dimension (e.g. time-series, treatments). When studying the effects of structural variants in non-coding regions in cancer, this complexity is further increased due to cancer-specific (e.g. heterogeneous samples, rapid evolution) and the multiple types and consequences of different structural variant-specific factors. The current methods for integrating and analysing these layers and dimensions have two significant limitations in their design: scalability and generality (i.e. the possibility to add more levels or dimensions). Moreover, there isn’t an option to overview a dataset without filtering, dividing or restructuring the data. The integration of complex datasets is needed to understand the complex biology of cancer better 3, but is restricted by these limitations. Enter the Semantic Web and its Resource Description Framework (RDF). A simple and flexible framework for describing anything about anything. Since every type of data can be translated to this universal language, integration of large datasets of different levels and dimensions becomes possible and a lot more feasible. When researchers have converted their local data to RDF, they can easily connect and combine it with public repositories, which makes analyses even more powerful. By using the SPARQL Protocol and RDF Query Language (SPARQL), retrieving and manipulating data in RDF is easily readable by both humans and computers. The user can subsequently visualise the SPARQL-results as a whole or filter them further. Here, we propose the use of semantic web technologies and visual analytics to decrease the complexity of integrating and visualizing multi-level and -dimensional biological data. These methods will enable further elucidation of the complex biology of, for example, cancer. Firstly, we will create the framework needed to design the missing tools for converting the most-used NGS-formats to RDF. Next, visualisations (based on visual analytics) of the biological RDF-data will be created, which will be used to perform previously impossible integration-focussed analyses on the consequences of structural variation in the non-coding regions of cancer-genomes.
dc.description.sponsorship	Utrecht University
dc.format.extent	2188901
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Everything should be linked: linking and visualising data for dynamic multidimensional biological data interpretation.
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	structural variation, multi-level data integration, next-generation sequencing, cancer, visual analytics
dc.subject.courseuu	Cancer, Stem Cells and Developmental Biology

Files in this item

Name:: Proposal.pdf
Size:: 2.087Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record