Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorQahtan, Hakim
dc.contributor.authorWypych, Aleksandra
dc.date.accessioned2022-08-13T00:00:43Z
dc.date.available2022-08-13T00:00:43Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42270
dc.description.abstractThe development of machine learning algorithms has greatly influenced decisionmaking at various levels. However, these algorithms tend to incorporate biases. Racial profiling in legal and financial systems are the best-known examples of inequality stemming from algorithm decisions. Previous research has shown that one of the reasons for racial bias is imbalanced data. This research will focus on generating synthetic data using Generative Adversarial Networks (GANs) to reduce bias. Inspired by GANs, this paper proposes the Intag framework. This framework contains a modified version of Pate-GAN for synthetic data generation. The main modification from the original Pate-GAN is that the hard privacy constraint is dropped. Other changes, such as changing the architecture of the network, such that a number of hidden layers depends on the dimension of input data. Moreover, the framework will incorporate undersampling techniques to ensure that the synthetic data samples are of the highest quality. The framework’s performance is evaluated on the basis of machine learning utility by checking the quality of the synthetic data generated by different methods. It is shown that the modified Pate-GAN achieves the best results. Furthermore, the framework improves the values of statistical parity and disparate impact, the two measures of fairness used in this study. We conclude that our proposed modification to Pate-GAN, and the framework in general, can be used for synthetic data generation. Moreover, it could be used as an aid for data generation to improve fairness in the case of an imbalanced dataset.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis research will focus on generating synthetic data using Generative Adversarial Networks (GANs) to reduce various forms of bias, that can be found in existing datasets. This thesis develops a framework that contains a modified version of Pate-GAN for synthetic data generation. Moreover, the framework will incorporates undersampling techniques to ensure that the synthetic data samples are of the highest quality.
dc.titleImproving the Quality of Synthetic Data Generation with Application in Algorithmic Fairness
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id8437


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record