Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKrempl, G.M.
dc.contributor.authorSalomons, Daniël
dc.date.accessioned2023-12-07T00:00:59Z
dc.date.available2023-12-07T00:00:59Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/45610
dc.description.abstractAutoencoders are small encoder-decoder pair networks that learn to compress data into a latent representation of smaller dimension. This thesis aims to outline the benefits and drawbacks of using latent representations as a utility-preserving data pseudonymisation method for machine learning. We consult existing anonymisation literature and EU legislature, followed by experiments on latent representation decoding, data utility and other latent representation properties. We found that without a leak of the original data along with its latent representation, it is difficult for an adversary to generate a well-performing reconstruction of the encoded dataset. This method is more effective if the latent representation is randomly permuted. This permutation is not easily reversed by a clustering algorithm. A latent representation preserves its data utility well for classification algorithms, even when permuted. Our experiments indicate that a dataset can be represented by multiple, well-performing latent representations, making it difficult for an adversary to discern which dataset was originally encoded. Autoencoders are quick to train, making it a quick method to pseudonymise data whilst retaining data utility for classification algorithms. As a pseudonymisation method, it is possible for the data holder to obtain a reconstruction of the data. However, latent representations would likely not be considered anonymised data by GDPR. Furthermore, regression algorithms perform worse than classification algorithms on latent representations. Finally, despite the popularity of mean squared error, we find that this loss function does not maximise data utility in latent representations.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectWe investigate under which circumstances an autoencoder's latent representation legally and experimentally suffices as a data pseudonymisation method.
dc.titleUtilising Autoencoder Latent Representations to Pseudonymise Data whilst Retaining Data Utility
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsData; anonymisation; pseudonymisation; GDPR; privacy-utility trade-off; autoencoder; latent representation
dc.subject.courseuuComputing Science
dc.thesis.id26340


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record