View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Distributed and Incremental Clustering using Shared Nearest Neighbours

        Thumbnail
        View/Open
        thesis.pdf (2.694Mb)
        Publication date
        2014
        Author
        Plattel, C.J.
        Metadata
        Show full item record
        Summary
        Cluster analysis is a large field inside data mining. It tries to capture the structure of a data set by grouping similar data points into clusters. Increasingly large and high dimensional data sets present a problem for modern clustering algorithms. Especially data sets containing nominal and sparse vector features. In this thesis I explore how cluster analysis can be applied best to these kind of data sets. Data from the Xenon web page crawler is used to test algorithms, which includes numeric, nominal and bag of words data. We study different algorithms and propose a new algorithm called DBSNN, which is based on Shared Nearest Neighbours and local densities. This algorithm is implemented and experimented with. In order to deal with the size of the data set the algorithm is made to work on a distributed system. The algorithm is robust against outliers and can detect noise, finds clusters independent of difference in shape, sizes or densities. In addition, the algorithm is extended to work on incremental data, meaning it can detect new clusters who represent upcoming trends in the data set.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/17603
        Collections
        • Theses
        Utrecht university logo