Graph-Based Exploration of Packet
Capture Data

Barócsai, Victoria

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Qahtan, Hakim
dc.contributor.author	Barócsai, Victoria
dc.date.accessioned	2025-08-21T00:03:21Z
dc.date.available	2025-08-21T00:03:21Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/49845
dc.description.abstract	Gaining insights into workload behavior across hybrid cloud environments is difficult when visibility into internal systems is restricted. This study investigates whether communi- cation patterns and deployment structure can be explored from packet capture (PCAP) data alone using a fully unsupervised, graph-based approach. Raw PCAP data from the CICIDS2017 dataset was aggregated into confirmed bidirectional flows and transformed into workload communication graphs, where nodes represent workloads and edges capture verified traffic exchanges. Node-level features were extracted by aggregating flow-level behavior and computing structural characteristics, including session volatility, TTL variability, external communi- cation ratio, and structural role score. Additional topological context was derived through Louvain community detection and component-type labeling; both were included as node- level features. Structural roles were mined using recursive feature expansion (ReFeX), followed by non-negative matrix factorization (NMF). The resulting node-role matrix W was used for behavioral scoring and clustering, while the role-feature matrix H, which encodes interpretable structural patterns, was examined separately to support the inter- pretation of topological traits. The resulting clusters exhibited distinct communication patterns, including stable services, orchestration nodes, data exporters, and volatile edge workloads. For example, short-lived, externally focused traffic was indicative of serverless or aggregator functions, while persistent, internally scoped patterns corresponded to virtual machines or long- running services. These clusters reflected structural formations such as high-degree hubs, dense clusters, and chain-like components. Several limitations and strengths were identified. While ground-truth labels were un- available, cluster characteristics aligned with recognizable workload types. Internal vali- dation using the DBCV score yielded a value of 0.5695. Key limitations include the static scope of the dataset, exclusion of application-layer semantics, and the qualitative na- ture of interpretation. Nonetheless, the proposed pipeline offers an interpretable method to analyze workload behavior from PCAP data, providing insight in
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	A Graph-Based Model for Analyzing Workload Communication Patterns, System Roles, and Cloud Deployment Artifacts in Hybrid Environments
dc.title	Graph-Based Exploration of Packet Capture Data
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	PCAP data analysis; workload behavior; deployment artifacts; graph net- works; systems thinking
dc.subject.courseuu	Applied Data Science
dc.thesis.id	52094

Files in this item

Name:: thesis-barocsai-6854893.pdf
Size:: 3.044Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

Graph-Based Exploration of Packet Capture Data

Files in this item

This item appears in the following Collection(s)