Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSiebes, Arno
dc.contributor.authorTusscher, Joris ten
dc.date.accessioned2024-07-24T23:07:13Z
dc.date.available2024-07-24T23:07:13Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46904
dc.description.abstractMinimal Infrequent Itemsets (MIIs) are infrequent itemsets that have no infrequent proper subsets. Different authors have explored the idea of detecting anomalies or privacy leaks in datasets by mining and analyzing MIIs. Because MIIs have been researched repeatedly in such contexts, it seems fitting to try and use MIIs for data sanitization. In this work, we explore this idea and develop and analyze different MII-based sanitization algorithms with various privacy guarantees like k-anonymity and (e, d)-differential privacy. Experimental results show that these algorithms yield sanitized datasets with good utility for different privacy parameters, datasets, and utility measures. These algorithms are meant for rectangular categorical m × n datasets, which is a common type of dataset in the world of privacy preservation. Because of our heavy use of MIIs in our algorithms, this begs the question how many MIIs a rectangular dataset can contain. We prove that a rectangular m × n dataset can contain at most m*(n choose ⌊n/2⌋) = ϴ(m2^(n−log(n)/2)) MIIs, and we prove that this bound is tight for all n and all infrequency thresholds θ as long as m is sufficiently large. This means that mining MIIs on a rectangular dataset requires exponential time and space with respect to n.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectPrivacy preservation on categorical datasets through Minimal Infrequent Itemset suppression
dc.titlePrivacy preservation on categorical datasets through Minimal Infrequent Itemset suppression
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsprivacy; itemset; minimal infrequent itemset; differential privacy; k-anonymity; sampling; categorical data
dc.subject.courseuuComputing Science
dc.thesis.id34807


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record