Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorDeoskar, Tejaswini
dc.contributor.authorDragar, Frenk
dc.date.accessioned2025-01-10T00:01:14Z
dc.date.available2025-01-10T00:01:14Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48360
dc.description.abstractThe rapid advancement of generative artificial intelligence (AI), especially large language models (LLMs), has led to unprecedented capabilities in text generation, leading to the urgent need for the development of methods that can identify AI-generated text and prevent misuse. Techniques like watermarking that can mark text or images as being AI-generated are currently being explored in the field but are in their infancy, and are especially challenging for textual output. This thesis focuses on model fingerprinting techniques, i.e. methods that embed fingerprints into a deep generative model, used for identification of models via prompting, and can also be used to authenticate the origin of AI-generated text. We propose a fine-tuning-based method to embed learnable fingerprints within LLMs, enabling black-box model authentication without requiring access to model parameters. We evaluate it for several desirable properties of fingerprints, such as maintenance of generated text quality, and robustness against attacks. Our experiments show that model quality is maintained, even with quantization, but fingerprints are susceptible to removal via fine-tuning and are not immune from being detected via data leakage. Additionally, we experiment with combining model fingerprints and common watermarking methods that embed signatures into the generated text, and evaluate which watermarking paradigms can be used in combination with model fingerprinting. Our motivation is to provide first insights into the potential of combining the strengths of both techniques for broader purposes and application to AI regulation, trustworthiness, detection, and authentication.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe rapid advancement of generative artificial intelligence (AI), especially large language models (LLMs), has led to unprecedented capabilities in text generation, leading to the urgent need for the development of methods that can identify AI-generated text and prevent misuse. Techniques like watermarking that can mark text or images as being AI-generated are currently being explored in the field but are in their infancy, and are especially challenging for textual output.
dc.titleLearnable Fingerprints for Large Language Models
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordslarge language model;fingerprinting;watermarking;artificial intelligence;safety;machine learning;fine-tuning;llm
dc.subject.courseuuArtificial Intelligence
dc.thesis.id42071


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record