Johnson-Lindenstrauss Transforms
Summary
The original Johnson-Lindenstrauss (JL) theorem says that a random data independent matrix can be
used to embed high dimensional data into a space of much lower dimension, such that the Euclidean
distances between all points are preserved up to an error term with high probability. In this thesis a version
of the JL theorem will be proven which states that multiplication by a random matrix consisting of sub-
Gaussian random variables can be used to embed the data as described above. The proof relies on some
cornerstone results of high dimensional probability theory. This thesis will therefore start with a thorough
analysis of properties and concentration inequalities for (squared) sub-Gaussian random variables, such as
Hoeffding’s and Bernstein’s inequality. After proving the JL theorem using these results, we will focus on
the Fast Johnson-Lindestrauss Transform (FJLT). This result discovered by Ailon and Chazelle in 2006,
states that by using a fast preconditioning of the data, a sparse random matrix can be used to obtain
the same distance preservation as with the original JL theorem. The benefit of the allowed sparsity is a
significant gain in time complexity. Furthermore this result will also be shown to hold for the ℓ1 distances
of the data. The proof will again have concentration inequalities at it’s core, but will also rely on convex
optimization theory.