Learnable Fingerprints for Large Language Models

Dragar, Frenk

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Deoskar, Tejaswini
dc.contributor.author	Dragar, Frenk
dc.date.accessioned	2025-01-10T00:01:14Z
dc.date.available	2025-01-10T00:01:14Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/48360
dc.description.abstract	The rapid advancement of generative artificial intelligence (AI), especially large language models (LLMs), has led to unprecedented capabilities in text generation, leading to the urgent need for the development of methods that can identify AI-generated text and prevent misuse. Techniques like watermarking that can mark text or images as being AI-generated are currently being explored in the field but are in their infancy, and are especially challenging for textual output. This thesis focuses on model fingerprinting techniques, i.e. methods that embed fingerprints into a deep generative model, used for identification of models via prompting, and can also be used to authenticate the origin of AI-generated text. We propose a fine-tuning-based method to embed learnable fingerprints within LLMs, enabling black-box model authentication without requiring access to model parameters. We evaluate it for several desirable properties of fingerprints, such as maintenance of generated text quality, and robustness against attacks. Our experiments show that model quality is maintained, even with quantization, but fingerprints are susceptible to removal via fine-tuning and are not immune from being detected via data leakage. Additionally, we experiment with combining model fingerprints and common watermarking methods that embed signatures into the generated text, and evaluate which watermarking paradigms can be used in combination with model fingerprinting. Our motivation is to provide first insights into the potential of combining the strengths of both techniques for broader purposes and application to AI regulation, trustworthiness, detection, and authentication.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	The rapid advancement of generative artificial intelligence (AI), especially large language models (LLMs), has led to unprecedented capabilities in text generation, leading to the urgent need for the development of methods that can identify AI-generated text and prevent misuse. Techniques like watermarking that can mark text or images as being AI-generated are currently being explored in the field but are in their infancy, and are especially challenging for textual output.
dc.title	Learnable Fingerprints for Large Language Models
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	large language model;fingerprinting;watermarking;artificial intelligence;safety;machine learning;fine-tuning;llm
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	42071

Files in this item

Name:: Frenk Dragar MSc Thesis - Learnable ...
Size:: 4.858Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record