Synthetic network generation for financial data

Schutte, Karen

View/Open

Thesis_Karen_Schutte_final.pdf (3.215Mb)

Publication date

2024

Author

Schutte, Karen

Metadata

Show full item record

Summary

This thesis presents a novel approach to generating synthetic transaction networks. The research focuses on developing a graph-based generative model capable of replicating charac- teristics observed in real-world financial networks. The motivation of this model is to preserve data privacy, and it generates networks that exhibit power-law degree distributions, no as- sortativity or disassortativity, exponential weight distributions, and community structures similar to those found in actual financial transaction data. The methodology involves a clustering analysis of a real transaction dataset to identify node-types, which are then integrated into the generative model. Parameters for node gen- eration, edge densification, and a probability matrix governing type-based connections are established to control the network’s structural properties. The model is validated against this real network dataset from Rabobank, by comparing the metrics and structural properties. Experimental results show that the model can produce stable synthetic networks over 200,000 iterations, with generated networks exhibiting comparable degree distributions, edge densities, and community structures to the real dataset. However, limitations include the use of a sampled and aggregated dataset for validation, which restricts the model’s ability to capture the full complexity of real financial networks, and the model’s exponential weight distribution diverging from the real dataset’s power-law weight distribution. This research contributes a publicly available tool, which can be used as a starting point for generating synthetic financial transaction networks, facilitating applications in machine learning model training for detecting criminal financial activity. Future research directions include improving weight distribution modeling, exploring algorithms for power-law distribu- tions, and extending the model to include interbank networks and temporal dynamics.

URI

https://studenttheses.uu.nl/handle/20.500.12932/46862

Collections

Theses