Privacy preservation on categorical datasets through Minimal Infrequent Itemset suppression

Tusscher, Joris ten

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Siebes, Arno
dc.contributor.author	Tusscher, Joris ten
dc.date.accessioned	2024-07-24T23:07:13Z
dc.date.available	2024-07-24T23:07:13Z
dc.date.issued	2024
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/46904
dc.description.abstract	Minimal Infrequent Itemsets (MIIs) are infrequent itemsets that have no infrequent proper subsets. Different authors have explored the idea of detecting anomalies or privacy leaks in datasets by mining and analyzing MIIs. Because MIIs have been researched repeatedly in such contexts, it seems fitting to try and use MIIs for data sanitization. In this work, we explore this idea and develop and analyze different MII-based sanitization algorithms with various privacy guarantees like k-anonymity and (e, d)-differential privacy. Experimental results show that these algorithms yield sanitized datasets with good utility for different privacy parameters, datasets, and utility measures. These algorithms are meant for rectangular categorical m × n datasets, which is a common type of dataset in the world of privacy preservation. Because of our heavy use of MIIs in our algorithms, this begs the question how many MIIs a rectangular dataset can contain. We prove that a rectangular m × n dataset can contain at most m*(n choose ⌊n/2⌋) = ϴ(m2^(n−log(n)/2)) MIIs, and we prove that this bound is tight for all n and all infrequency thresholds θ as long as m is sufficiently large. This means that mining MIIs on a rectangular dataset requires exponential time and space with respect to n.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Privacy preservation on categorical datasets through Minimal Infrequent Itemset suppression
dc.title	Privacy preservation on categorical datasets through Minimal Infrequent Itemset suppression
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	privacy; itemset; minimal infrequent itemset; differential privacy; k-anonymity; sampling; categorical data
dc.subject.courseuu	Computing Science
dc.thesis.id	34807

Files in this item

Name:: Thesis-Joris_ten_Tusscher-13_0 ...
Size:: 1.156Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record