Leveraging the Transferability Of Structural Graph Features for GNN Pre-training
Summary
Graph Neural Networks (GNNs) have shown remarkable success in
modeling relational data across various domains. However, training GNNs
from scratch for each new task or dataset remains computationally expensive
and often requires large amounts of labeled data, which is often scarce.
This thesis explores strategies for pre-training GNNs with a focus on how
to use common topological features in pre-training for enhancing transferability
and how to leverage new unseen features in the downstream task to
improve a model’s performance. The central hypothesis is that common and
easily obtainable topological features such as degree, PageRank, eigenvector
centrality, and clustering coefficients can be leveraged to build generalizable
latent representations. Those latent representations can then be used with
new features obtained in the downstream task to improve a model’s performance.
We investigate methods for encoding these common features during
pre-training and how to combine them with downstream features, aiming to
improve performance in a downstream task in a domain where data is limited.
The work proposes two frameworks for topologically based pre-training
and evaluates the effectiveness of the pre-training during the downstream
task. Our findings demonstrate that using topological graph features in the
pre-training process increases a model’s performance on the downstream task.
Moreover, during our experiments, we found that adding topological features
to a model greatly increases its performance.