Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributorDr. Ing. H. G. Krempl Dr. I. R. Karnstedt-Hulpus F. Miedema MSc J. Ranzijn MSc
dc.contributor.advisorKrempl, G.M.
dc.contributor.authorPlantinga, Joël
dc.date.accessioned2024-04-30T00:02:02Z
dc.date.available2024-04-30T00:02:02Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46336
dc.description.abstractGathered data sources hold increasing significance for law enforce- ment agencies to find evidence for criminal investigations. Network traffic stands out as one such data source, offering detailed insights into individuals’ behaviors and interests. Given the potential size of intercepted network traffic, clustering methods are required to find what activities were performed in the data; methods that do not yet exist. Unlike text data, internet sessions lack inherent coherence and often include noise from unrelated traffic. Traditional unsupervised methods may yield suboptimal results due to this noise, while su- pervised methods are infeasible due to the absence of labeled data. To address these challenges, we proposes an adaptation of the ‘Clus- tering By Intent’ (CBI) technique to network traffic, named Embed- ded Clustering By Intent (ECBI). ECBI uses Word2vec embeddings to generate queries for identifying synonymous features in internet traffic, enabling clustering based on actual activities. The proposed method is validated using specifically sampled internet traffic from the Dutch National Police. In an evaluation consisting of expert in- terviews and technical experiments, ECBI is compared against CBI and Embedded Topic Modelling. Notably, ECBI clusters contain sessions from a significantly larger number of entities, indicating a focus on activities rather than specifics of an individual entity. Furthermore, we show that Word2vec can effectively be applied on network traffic.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA thesis project after the creation of a method to cluster network traffic data.
dc.titleEmbedded Clustering by Intent: an Active Network Traffic Clustering method
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsActive learning, Network Traffic, Clustering by Intent, Word2vec
dc.subject.courseuuBusiness Informatics
dc.thesis.id30435


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record