dc.description.abstract | Gaining insights into workload behavior across hybrid cloud environments is difficult when
visibility into internal systems is restricted. This study investigates whether communi-
cation patterns and deployment structure can be explored from packet capture (PCAP)
data alone using a fully unsupervised, graph-based approach. Raw PCAP data from the
CICIDS2017 dataset was aggregated into confirmed bidirectional flows and transformed
into workload communication graphs, where nodes represent workloads and edges capture
verified traffic exchanges.
Node-level features were extracted by aggregating flow-level behavior and computing
structural characteristics, including session volatility, TTL variability, external communi-
cation ratio, and structural role score. Additional topological context was derived through
Louvain community detection and component-type labeling; both were included as node-
level features. Structural roles were mined using recursive feature expansion (ReFeX),
followed by non-negative matrix factorization (NMF). The resulting node-role matrix W
was used for behavioral scoring and clustering, while the role-feature matrix H, which
encodes interpretable structural patterns, was examined separately to support the inter-
pretation of topological traits.
The resulting clusters exhibited distinct communication patterns, including stable
services, orchestration nodes, data exporters, and volatile edge workloads. For example,
short-lived, externally focused traffic was indicative of serverless or aggregator functions,
while persistent, internally scoped patterns corresponded to virtual machines or long-
running services. These clusters reflected structural formations such as high-degree hubs,
dense clusters, and chain-like components.
Several limitations and strengths were identified. While ground-truth labels were un-
available, cluster characteristics aligned with recognizable workload types. Internal vali-
dation using the DBCV score yielded a value of 0.5695. Key limitations include the static
scope of the dataset, exclusion of application-layer semantics, and the qualitative na-
ture of interpretation. Nonetheless, the proposed pipeline offers an interpretable method
to analyze workload behavior from PCAP data, providing insight in | |