How realistic is my synthetic data? A qualitative approach.
Summary
Missing values represent one of the most common challenges for data analytics tasks.
For that reason, a lot of techniques have been proposed to fill the missing values
through what is called ”Data Imputation”. Recent studies on generating synthetic
data demonstrate that Generative Adversarial Networks (GANs) can be used to effectively
solve this problem as follows: for each example in the original data generate
a synthetic example that keeps the existing values. The generated example should
contain values for the features with missing values. However, to confirm if GANs can
provide significant improvements over traditional data imputation techniques, we need
a technique to measure the quality of the generated examples. The quality of the generated
example can be measured by determining how realistic the synthetic data is
compared to the original examples. In this project, we develop a tool for successfully
measuring the quality of the synthetic data. We compare the quality of the generated
data using GANs to other synthetic data generation techniques.