Explainability of Transformers for Authorship Attribution

Kondyurin, Ivan

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Paperno, Denis
dc.contributor.author	Kondyurin, Ivan
dc.date.accessioned	2022-08-11T00:00:49Z
dc.date.available	2022-08-11T00:00:49Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/42258
dc.description.abstract	Authorship attribution attempts to establish the author of a particular text. In this work, we examine the capabilities of transformer-based models in the subtype of attribution task referred to as authorship verification, which involves determining whether the texts are created by the same author. A few works have been suggested that applied fine-tuned Transformer models in this field. Such approach is motivated by their excellent performance and adaptability (fine-tuning can be performed on texts of different sizes and genres, and different pre-trained model checkpoints enable switching between languages). However, they are not as transparent as the traditional methods, in which features that quantify the style (stylometric features) are selected to maximize the distance between texts. To tackle this problem, we first implement a model for authorship verification based on BERT architecture and then investigate the way its predictions are made by applying an adapted LIME explainer and proposing an attention-based relevant feature extracting procedure. We then compare the two approaches and analyze their explainability from the causal perspective by input ablation and alteration to verify that they can retrieve the features that have a strong influence on the model predictions. We also describe and classify the extracted features from a linguistic perspective.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	This project examines capabilities of Transformer models in the task of authorship verification, which involves determining if the texts are created by the same author. We then explore the degree of their explainability by applying two approaches: an adapted LIME explainer and a proposed attention-based relevant feature extracting. We then compare these techniques, analyze their explainability from the causal perspective, and ground them in stylometric theory.
dc.title	Explainability of Transformers for Authorship Attribution
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Authorship attribution; stylometry; transformers; BERT; attention; LIME
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	8280

Files in this item

Name:: Thesis.pdf
Size:: 6.391Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record