Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorQahtan, Hakim
dc.contributor.authorBarócsai, Victoria
dc.date.accessioned2025-08-21T00:03:21Z
dc.date.available2025-08-21T00:03:21Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/49845
dc.description.abstractGaining insights into workload behavior across hybrid cloud environments is difficult when visibility into internal systems is restricted. This study investigates whether communi- cation patterns and deployment structure can be explored from packet capture (PCAP) data alone using a fully unsupervised, graph-based approach. Raw PCAP data from the CICIDS2017 dataset was aggregated into confirmed bidirectional flows and transformed into workload communication graphs, where nodes represent workloads and edges capture verified traffic exchanges. Node-level features were extracted by aggregating flow-level behavior and computing structural characteristics, including session volatility, TTL variability, external communi- cation ratio, and structural role score. Additional topological context was derived through Louvain community detection and component-type labeling; both were included as node- level features. Structural roles were mined using recursive feature expansion (ReFeX), followed by non-negative matrix factorization (NMF). The resulting node-role matrix W was used for behavioral scoring and clustering, while the role-feature matrix H, which encodes interpretable structural patterns, was examined separately to support the inter- pretation of topological traits. The resulting clusters exhibited distinct communication patterns, including stable services, orchestration nodes, data exporters, and volatile edge workloads. For example, short-lived, externally focused traffic was indicative of serverless or aggregator functions, while persistent, internally scoped patterns corresponded to virtual machines or long- running services. These clusters reflected structural formations such as high-degree hubs, dense clusters, and chain-like components. Several limitations and strengths were identified. While ground-truth labels were un- available, cluster characteristics aligned with recognizable workload types. Internal vali- dation using the DBCV score yielded a value of 0.5695. Key limitations include the static scope of the dataset, exclusion of application-layer semantics, and the qualitative na- ture of interpretation. Nonetheless, the proposed pipeline offers an interpretable method to analyze workload behavior from PCAP data, providing insight in
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA Graph-Based Model for Analyzing Workload Communication Patterns, System Roles, and Cloud Deployment Artifacts in Hybrid Environments
dc.titleGraph-Based Exploration of Packet Capture Data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsPCAP data analysis; workload behavior; deployment artifacts; graph net- works; systems thinking
dc.subject.courseuuApplied Data Science
dc.thesis.id52094


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record