Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorFeelders, A.J.
dc.contributor.authorPlattel, C.J.
dc.date.accessioned2014-08-11T17:00:52Z
dc.date.available2014-08-11T17:00:52Z
dc.date.issued2014
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/17603
dc.description.abstractCluster analysis is a large field inside data mining. It tries to capture the structure of a data set by grouping similar data points into clusters. Increasingly large and high dimensional data sets present a problem for modern clustering algorithms. Especially data sets containing nominal and sparse vector features. In this thesis I explore how cluster analysis can be applied best to these kind of data sets. Data from the Xenon web page crawler is used to test algorithms, which includes numeric, nominal and bag of words data. We study different algorithms and propose a new algorithm called DBSNN, which is based on Shared Nearest Neighbours and local densities. This algorithm is implemented and experimented with. In order to deal with the size of the data set the algorithm is made to work on a distributed system. The algorithm is robust against outliers and can detect noise, finds clusters independent of difference in shape, sizes or densities. In addition, the algorithm is extended to work on incremental data, meaning it can detect new clusters who represent upcoming trends in the data set.
dc.description.sponsorshipUtrecht University
dc.format.extent2825604
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleDistributed and Incremental Clustering using Shared Nearest Neighbours
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsClustering, Algorithms, Incremental, Distributed, Shared Nearest Neighbour
dc.subject.courseuuComputing Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record