dc.description.abstract | Companies, institutions and researchers around the world are collecting enormous sets of high-dimensional data at breakneck speed. However, our understanding of the collected data is not nearly keeping up. One of the main approaches to understanding these datasets has been to reduce the data to a low-dimensional representation, called a projection, that can subsequently be visualised.
Seeing visible patterns in these projections indicates there are relationships between the dimensions of the high-dimensional data. However, it does not tell us anything about what those relationships are.
Several efforts have previously been made to explain the patterns in the projection in terms of their original dimensions. However, they tend to fall short in adequately explaining them, or the techniques don't scale well to a higher number of dimensions. Therefore, this thesis aims to answer the question how to adequately explain these patterns in projections of high-dimensional data, while simultaneously scaling better than previous techniques in the number of data dimensions.
We extend the variance-based explanations of previous work with a value-based explanation, that gives insight into, not only why the patterns are there, but what they represent. Furthermore, we introduce a user-driven exploration mechanism that provides significantly more detailed explanations of regions in the projection. In addition, these explanations are augmented by a number of tools that support their function. We integrate all of the above elements into a visualisation solution for exploring high-dimensional data projections.
We assess the visualisation system using an evaluation study asking a mix of 23 experts and non-experts to analyze several datasets of increasing dimensionality (12, 31, 58) using the proposed solution, as well as their opinion on the usefulness of each of the elements of the visualisation solution.
Participants rated each of the elements of the visualisation system highly in terms of their usefulness. In addition, with minimal training and by overwhelming majority, participants answered correctly to a series of twelve control questions meant to test whether they understood how to read the explanations generated by the visualisation system. On a series of nine more complex analysis questions, where participants had to use the system themselves, the majority gave answers that strongly aligned with our analysis. This indicates use of the system results in consistent insights about the data with only minor training or expertise required.
Overall, the evaluation study indicates that our visualisation solution is capable of providing detailed and consistent explanations of patterns in data projections, even as the dimensionality of the data gets higher. | |