Synthetic network generation for financial data
Summary
This thesis presents a novel approach to generating synthetic transaction networks. The
research focuses on developing a graph-based generative model capable of replicating charac-
teristics observed in real-world financial networks. The motivation of this model is to preserve
data privacy, and it generates networks that exhibit power-law degree distributions, no as-
sortativity or disassortativity, exponential weight distributions, and community structures
similar to those found in actual financial transaction data.
The methodology involves a clustering analysis of a real transaction dataset to identify
node-types, which are then integrated into the generative model. Parameters for node gen-
eration, edge densification, and a probability matrix governing type-based connections are
established to control the network’s structural properties. The model is validated against this
real network dataset from Rabobank, by comparing the metrics and structural properties.
Experimental results show that the model can produce stable synthetic networks over
200,000 iterations, with generated networks exhibiting comparable degree distributions, edge
densities, and community structures to the real dataset. However, limitations include the
use of a sampled and aggregated dataset for validation, which restricts the model’s ability
to capture the full complexity of real financial networks, and the model’s exponential weight
distribution diverging from the real dataset’s power-law weight distribution.
This research contributes a publicly available tool, which can be used as a starting point
for generating synthetic financial transaction networks, facilitating applications in machine
learning model training for detecting criminal financial activity. Future research directions
include improving weight distribution modeling, exploring algorithms for power-law distribu-
tions, and extending the model to include interbank networks and temporal dynamics.