View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Graph-Based Exploration of Packet Capture Data

        Thumbnail
        View/Open
        thesis-barocsai-6854893.pdf (3.044Mb)
        Publication date
        2025
        Author
        Barócsai, Victoria
        Metadata
        Show full item record
        Summary
        Gaining insights into workload behavior across hybrid cloud environments is difficult when visibility into internal systems is restricted. This study investigates whether communi- cation patterns and deployment structure can be explored from packet capture (PCAP) data alone using a fully unsupervised, graph-based approach. Raw PCAP data from the CICIDS2017 dataset was aggregated into confirmed bidirectional flows and transformed into workload communication graphs, where nodes represent workloads and edges capture verified traffic exchanges. Node-level features were extracted by aggregating flow-level behavior and computing structural characteristics, including session volatility, TTL variability, external communi- cation ratio, and structural role score. Additional topological context was derived through Louvain community detection and component-type labeling; both were included as node- level features. Structural roles were mined using recursive feature expansion (ReFeX), followed by non-negative matrix factorization (NMF). The resulting node-role matrix W was used for behavioral scoring and clustering, while the role-feature matrix H, which encodes interpretable structural patterns, was examined separately to support the inter- pretation of topological traits. The resulting clusters exhibited distinct communication patterns, including stable services, orchestration nodes, data exporters, and volatile edge workloads. For example, short-lived, externally focused traffic was indicative of serverless or aggregator functions, while persistent, internally scoped patterns corresponded to virtual machines or long- running services. These clusters reflected structural formations such as high-degree hubs, dense clusters, and chain-like components. Several limitations and strengths were identified. While ground-truth labels were un- available, cluster characteristics aligned with recognizable workload types. Internal vali- dation using the DBCV score yielded a value of 0.5695. Key limitations include the static scope of the dataset, exclusion of application-layer semantics, and the qualitative na- ture of interpretation. Nonetheless, the proposed pipeline offers an interpretable method to analyze workload behavior from PCAP data, providing insight in
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49845
        Collections
        • Theses
        Utrecht university logo