Provable Privacy for Database Generation: an Information Theoretic Approach
Summary
Many methods exist to avoid disclosing sensitive information when releasing a
database. However these methods either cannot guarantee that the information
of individuals is secure or are aimed at specific use cases. In this paper we
develop a method which is both provably private and retains the overall form of
the original database. To achieve this we derive a privacy measure, epsilon-dependence.
Intuitively, epsilon-dependence requires that the input and output databases are nearly
independent. We show that epsilon-dependence can be seen as an information theoretic
refinement of differential privacy. We then adapt the KRIMP algorithm to
generate databases while satisfying epsilon-dependence. We show through experiments
that the generated databases are comparable to the original databases when
performing machine learning or itemset mining tasks. The results are especially
good on larger databases.