View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Privacy preservation on categorical datasets through Minimal Infrequent Itemset suppression

        Thumbnail
        View/Open
        Thesis-Joris_ten_Tusscher-13_06_24.pdf (1.156Mb)
        Publication date
        2024
        Author
        Tusscher, Joris ten
        Metadata
        Show full item record
        Summary
        Minimal Infrequent Itemsets (MIIs) are infrequent itemsets that have no infrequent proper subsets. Different authors have explored the idea of detecting anomalies or privacy leaks in datasets by mining and analyzing MIIs. Because MIIs have been researched repeatedly in such contexts, it seems fitting to try and use MIIs for data sanitization. In this work, we explore this idea and develop and analyze different MII-based sanitization algorithms with various privacy guarantees like k-anonymity and (e, d)-differential privacy. Experimental results show that these algorithms yield sanitized datasets with good utility for different privacy parameters, datasets, and utility measures. These algorithms are meant for rectangular categorical m × n datasets, which is a common type of dataset in the world of privacy preservation. Because of our heavy use of MIIs in our algorithms, this begs the question how many MIIs a rectangular dataset can contain. We prove that a rectangular m × n dataset can contain at most m*(n choose ⌊n/2⌋) = ϴ(m2^(n−log(n)/2)) MIIs, and we prove that this bound is tight for all n and all infrequency thresholds θ as long as m is sufficiently large. This means that mining MIIs on a rectangular dataset requires exponential time and space with respect to n.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46904
        Collections
        • Theses
        Utrecht university logo