How realistic is my synthetic data? A qualitative approach.

Tzikas, Rigas

View/Open

Thesis - Rigas Tzikas.pdf (879.0Kb)

Publication date

2022

Author

Tzikas, Rigas

Metadata

Show full item record

Summary

Missing values represent one of the most common challenges for data analytics tasks. For that reason, a lot of techniques have been proposed to fill the missing values through what is called ”Data Imputation”. Recent studies on generating synthetic data demonstrate that Generative Adversarial Networks (GANs) can be used to effectively solve this problem as follows: for each example in the original data generate a synthetic example that keeps the existing values. The generated example should contain values for the features with missing values. However, to confirm if GANs can provide significant improvements over traditional data imputation techniques, we need a technique to measure the quality of the generated examples. The quality of the generated example can be measured by determining how realistic the synthetic data is compared to the original examples. In this project, we develop a tool for successfully measuring the quality of the synthetic data. We compare the quality of the generated data using GANs to other synthetic data generation techniques.

URI

https://studenttheses.uu.nl/handle/20.500.12932/43132

Collections

Theses