Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSchraagen, dr. M.P.
dc.contributor.advisorDeoskar, dr. T.
dc.contributor.advisorMense, dr. J.P.
dc.contributor.authorMaagendans, L.A.Y.
dc.date.accessioned2021-01-21T19:00:14Z
dc.date.available2021-01-21T19:00:14Z
dc.date.issued2021
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/38636
dc.description.abstractThis work presents research on the pair-wise impostor finding problem: `Given a pair of user accounts (and optionally, messages sent by either user account), can one reliably determine whether this pair is controlled by the same actual author?'. The specific domain for which this thesis aims to solve the pair-wise problem is social networks in which short conversational texts are sent between nodes. Two approaches to this problem are evaluated. The first is the combination of stylometric authorship attribution methods with the Doppelgänger Finder. Three stylometric authorship attribution methods and various stylometric feature sets are compared on their performance on the authorship attribution task on short conversational text: Cosine Delta, SVM and CNN. The CNN model achieves the highest scores on all datasets used, and for all three methods, character 3-grams and word 1-grams prove to capture the most characteristic information. The second approach to the pair-wise problem is a direct network analysis-based method, which is an original contribution of this thesis. This method is evaluated on the Opsahl Facebook-like Social Network dataset, where the edges have been injected with Tweets from the Sentiment140 Twitter dataset. The network analysis-based method does not attain notable results when used with only network features or stylometric features. However, with stylometric and network features combined, it reaches a weighted F1-score of 0.7 on the pair-wise problem. To compare the two approaches, the SVM model is applied to the injected subset of the Sentiment140 Twitter dataset on the authorship attribution task. The Doppelgänger Finder is applied to the predictions of the SVM model to answer the pair-wise problem. The resulting scores are no higher than 0.5, which is unremarkable on the binary pair-wise problem, but also lower than the 0.7 attained by the network analysis-based method.
dc.description.sponsorshipUtrecht University
dc.format.extent903412
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleImpostor Finding Using Stylometry and Network Analysis
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsImpostor finding, authorship attribution, network analysis, stylometry
dc.subject.courseuuArtificial Intelligence


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record