Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorGatt, A.
dc.contributor.authorLockhorst, Sjors
dc.date.accessioned2024-07-24T23:06:53Z
dc.date.available2024-07-24T23:06:53Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46899
dc.description.abstractLarge Language Models (LLMs) have greatly improved the diversity and quality of machine-generated text. So much so that humans score at chance when distinguishing human-written texts from LLM-generated texts. Associated risks include accelerating phishing, disinformation, fraudulent product reviews, academic dishonesty, and spam. Detecting LLM-generated text could prove crucial in mitigating these risks. Many detectors have been proposed, however, past work has mainly focused on building detectors within one domain, on the output of one LLM. The most performant detector seems to be a fine-tuned masked Language Model (LM) with a classification head. But these detectors struggle with several issues such as lack of interpretability, difficulty in generalizing to unseen domains, and lack of robustness to adversarial attacks. This study sheds a light on the performance and robustness of various LLM-generated text detectors across 10 different domains, as well as investigate if robustness can be improved through data augmentation. We provide interpretable baselines for each domain, as well as a comparison between a fine-tuned LM trained on all domain data and an in-domain fine-tuned LM. We first show that a fined-tuned LM detector trained on multiple domains indeed has trouble generalizing to an unseen domain. We then show that performance of various detectors varies between domains. In some domains a detector trained on all domains leads to better performance, while on others fine-tuning within domain is better. We then attack detectors in different domains with a character level attack and paraphrasing attack, and show that models are of variable robustness depending on the domain. We finally show that our fine-tuned LM detector trained on student-written essays, can be made robust to character level attacks through data augmentation, most effectively by adding paraphrases to the training data.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe evaluation of performance of LLM-written text detectors across domains and under adversarial attacks. It evaluates how good different models are at detecting if a text was written by a LLM, or a human. It also tries to asses how easily one can 'fool' such detection models, through various adversarial attacks.
dc.titlePerformance of LLM-written text detectors across domains and under adversarial attack
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id34825


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record