View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Improving the Quality of Synthetic Data Generation with Application in Algorithmic Fairness

        Thumbnail
        View/Open
        Master Thesis - Aleksandra Wypych 6624154 Improving the Quality of Synthetic Data Generation with Application in Algorithmic Fairness version 2 (1).pdf (2.387Mb)
        Publication date
        2022
        Author
        Wypych, Aleksandra
        Metadata
        Show full item record
        Summary
        The development of machine learning algorithms has greatly influenced decisionmaking at various levels. However, these algorithms tend to incorporate biases. Racial profiling in legal and financial systems are the best-known examples of inequality stemming from algorithm decisions. Previous research has shown that one of the reasons for racial bias is imbalanced data. This research will focus on generating synthetic data using Generative Adversarial Networks (GANs) to reduce bias. Inspired by GANs, this paper proposes the Intag framework. This framework contains a modified version of Pate-GAN for synthetic data generation. The main modification from the original Pate-GAN is that the hard privacy constraint is dropped. Other changes, such as changing the architecture of the network, such that a number of hidden layers depends on the dimension of input data. Moreover, the framework will incorporate undersampling techniques to ensure that the synthetic data samples are of the highest quality. The framework’s performance is evaluated on the basis of machine learning utility by checking the quality of the synthetic data generated by different methods. It is shown that the modified Pate-GAN achieves the best results. Furthermore, the framework improves the values of statistical parity and disparate impact, the two measures of fairness used in this study. We conclude that our proposed modification to Pate-GAN, and the framework in general, can be used for synthetic data generation. Moreover, it could be used as an aid for data generation to improve fairness in the case of an imbalanced dataset.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/42270
        Collections
        • Theses
        Utrecht university logo