Utilising Autoencoder Latent Representations to Pseudonymise Data whilst Retaining Data Utility

Salomons, Daniël

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Krempl, G.M.
dc.contributor.author	Salomons, Daniël
dc.date.accessioned	2023-12-07T00:00:59Z
dc.date.available	2023-12-07T00:00:59Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/45610
dc.description.abstract	Autoencoders are small encoder-decoder pair networks that learn to compress data into a latent representation of smaller dimension. This thesis aims to outline the benefits and drawbacks of using latent representations as a utility-preserving data pseudonymisation method for machine learning. We consult existing anonymisation literature and EU legislature, followed by experiments on latent representation decoding, data utility and other latent representation properties. We found that without a leak of the original data along with its latent representation, it is difficult for an adversary to generate a well-performing reconstruction of the encoded dataset. This method is more effective if the latent representation is randomly permuted. This permutation is not easily reversed by a clustering algorithm. A latent representation preserves its data utility well for classification algorithms, even when permuted. Our experiments indicate that a dataset can be represented by multiple, well-performing latent representations, making it difficult for an adversary to discern which dataset was originally encoded. Autoencoders are quick to train, making it a quick method to pseudonymise data whilst retaining data utility for classification algorithms. As a pseudonymisation method, it is possible for the data holder to obtain a reconstruction of the data. However, latent representations would likely not be considered anonymised data by GDPR. Furthermore, regression algorithms perform worse than classification algorithms on latent representations. Finally, despite the popularity of mean squared error, we find that this loss function does not maximise data utility in latent representations.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	We investigate under which circumstances an autoencoder's latent representation legally and experimentally suffices as a data pseudonymisation method.
dc.title	Utilising Autoencoder Latent Representations to Pseudonymise Data whilst Retaining Data Utility
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Data; anonymisation; pseudonymisation; GDPR; privacy-utility trade-off; autoencoder; latent representation
dc.subject.courseuu	Computing Science
dc.thesis.id	26340

Files in this item

Name:: Thesis_Data_Pseudonymisation.pdf
Size:: 2.685Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record