Distributed and Incremental Clustering using Shared Nearest Neighbours

Plattel, C.J.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Feelders, A.J.
dc.contributor.author	Plattel, C.J.
dc.date.accessioned	2014-08-11T17:00:52Z
dc.date.available	2014-08-11T17:00:52Z
dc.date.issued	2014
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/17603
dc.description.abstract	Cluster analysis is a large field inside data mining. It tries to capture the structure of a data set by grouping similar data points into clusters. Increasingly large and high dimensional data sets present a problem for modern clustering algorithms. Especially data sets containing nominal and sparse vector features. In this thesis I explore how cluster analysis can be applied best to these kind of data sets. Data from the Xenon web page crawler is used to test algorithms, which includes numeric, nominal and bag of words data. We study different algorithms and propose a new algorithm called DBSNN, which is based on Shared Nearest Neighbours and local densities. This algorithm is implemented and experimented with. In order to deal with the size of the data set the algorithm is made to work on a distributed system. The algorithm is robust against outliers and can detect noise, finds clusters independent of difference in shape, sizes or densities. In addition, the algorithm is extended to work on incremental data, meaning it can detect new clusters who represent upcoming trends in the data set.
dc.description.sponsorship	Utrecht University
dc.format.extent	2825604
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Distributed and Incremental Clustering using Shared Nearest Neighbours
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Clustering, Algorithms, Incremental, Distributed, Shared Nearest Neighbour
dc.subject.courseuu	Computing Science

Files in this item

Name:: thesis.pdf
Size:: 2.694Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record