Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorPhilippi, H.
dc.contributor.authorSchuiling, N.H.
dc.date.accessioned2014-09-16T17:00:54Z
dc.date.available2014-09-16T17:00:54Z
dc.date.issued2014
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/18326
dc.description.abstractFrequent pattern mining is a computationally expensive preprocessing step for association rule mining and involves major challenges when data sets are large. Many sophisticated algorithms have been proposed since its introduction, however most of them don't address the real difficulty: scalability. In this thesis we study how to leverage RDBMS integration to improve the scalability of frequent pattern mining. We propose a hybrid framework based on FP-growth, a popular frequent pattern mining algorithm. For FP-tree construction we develop an SQL-based approach and for mining patterns we develop an external application that interfaces with the RDBMS. Subsequently, we adapt this framework to achieve high scalability, by sharding the transaction database and distributing computation over multiple computing instances. Experimental results with MonetDB5 show that this framework is capable of handling large data sets using a scale out strategy. It achieves near-linear speedup when computing instances are added. Finally, we develop an FP-growth based top-k frequent pattern mining method and integrate this method in our framework. To mine the top-k frequent patterns efficiently, we change the FP-growth mining algorithm to mine patterns in support descending order using a level-wise search strategy instead of the depth-first-search strategy of the original algorithm. Experimental results show that this method has surprisingly good performance on a large data set.
dc.description.sponsorshipUtrecht University
dc.format.extent1363464
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleScalable Frequent Pattern Mining using Relational Databases
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsfrequent pattern mining;data mining;fp-growth;databases;query processing;top-k;distributed algorithms
dc.subject.courseuuComputing Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record