dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Spruit, M.R. | |
dc.contributor.author | Lefebvre, A.E. | |
dc.date.accessioned | 2015-09-07T17:00:49Z | |
dc.date.available | 2015-09-07T17:00:49Z | |
dc.date.issued | 2015 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/27888 | |
dc.description.abstract | Data analysis of Next-Generation sequencing data is widely recognized as being a bottleneck on the way to understanding the human genome and personalize treatments. Studies argue for more integrative and interactive data analytics solutions that would largely automate and accelerate scientific discovery. At the same time, concerns are raised about the communication of computational experiments (CE) and their components which are code, data and algorithms. These concerns are mainly propagated by proponents of Reproducible Research (RR).
This research attempts to embed these interactive knowledge discovery practices with RR. This while investigating how RR constraints of sharing and reuse of data and code are applicable in real settings. To achieve this a prototype implementing the four steps of the HCI-KDD process (integration, preprocessing, mining and visualization/interaction) was developed and tested by biologists and bioinformaticians. The prototype is built around web resources to enable sharing and reuse of components produced during the KDD process.
Feedback from the prototype evaluation via three focus groups and one survey is summarized in a context enriched HCI-KDD process. This design proposition named RRO-KDD (Reproducible Resource-Oriented Knowledge discovery from data) merges HCI-KDD activities with design choices of the prototype. The goal is to improve reusability and sharing of previous work while taking into account how data is actually used and processed in biomedical research.
Our results suggest that there is room for improvement for applications enabling data analytics for biologists working with bioinformaticians. Sharing code and data as-is is not the optimal way to positively impact reuse of previous work as there is no sufficient contextual information to make retrieval convenient for both type of users involved in CEs. Web resources for visualization, data objects and research objects have the potential to be combined to address both the needs for interactivity and reusability better than the current practices suggested by Reproducible Research. | |
dc.description.sponsorship | Utrecht University | |
dc.format.extent | 4775230 | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en_US | |
dc.title | Reproducible Research and Interactive Data Mining in Bioinformatics | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | Reproducible Research, Knowledge discovery, bioinformatics, biomedical science | |
dc.subject.courseuu | Business Informatics | |