View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Scalable Frequent Pattern Mining using Relational Databases

        Thumbnail
        View/Open
        Thesis.pdf (1.300Mb)
        Publication date
        2014
        Author
        Schuiling, N.H.
        Metadata
        Show full item record
        Summary
        Frequent pattern mining is a computationally expensive preprocessing step for association rule mining and involves major challenges when data sets are large. Many sophisticated algorithms have been proposed since its introduction, however most of them don't address the real difficulty: scalability. In this thesis we study how to leverage RDBMS integration to improve the scalability of frequent pattern mining. We propose a hybrid framework based on FP-growth, a popular frequent pattern mining algorithm. For FP-tree construction we develop an SQL-based approach and for mining patterns we develop an external application that interfaces with the RDBMS. Subsequently, we adapt this framework to achieve high scalability, by sharding the transaction database and distributing computation over multiple computing instances. Experimental results with MonetDB5 show that this framework is capable of handling large data sets using a scale out strategy. It achieves near-linear speedup when computing instances are added. Finally, we develop an FP-growth based top-k frequent pattern mining method and integrate this method in our framework. To mine the top-k frequent patterns efficiently, we change the FP-growth mining algorithm to mine patterns in support descending order using a level-wise search strategy instead of the depth-first-search strategy of the original algorithm. Experimental results show that this method has surprisingly good performance on a large data set.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/18326
        Collections
        • Theses
        Utrecht university logo