Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorGatt, A.
dc.contributor.authorQuantmeyer, Vincent
dc.date.accessioned2024-03-15T00:02:02Z
dc.date.available2024-03-15T00:02:02Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46171
dc.description.abstractVarious benchmarks have measured linguistic capabilities of vision-and-language (VL) models, but do not provide insights into how models implement these capabilities. This thesis translates model interpretability techniques developed for large language models to the mul- timodal space in order to investigate the mechanisms involved in CLIP’s processing of negation. In the text encoder, specific negator-selective attention heads are found that seem crucial in controlling the movement of negation-related information through the model. Early evidence suggests that these heads are dataset-independent. In the image encoder, MLPs seem more relevant than attention, particularly in early layers, but further research is needed to elucidate these processes. As for CLIP’s imperfect ability to process negation correctly, multiple dataset features are identified that partly explain its performance, suggesting that benchmark performance isn’t a direct indicator of linguistic understanding. Future research directions are discussed that refine our understanding of the discovered mechanisms and test their generalisability on other datasets and models.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis thesis translates model interpretability techniques developed for large language models to the multimodal space to investigate the mechanisms involved in CLIP’s processing of negation.
dc.titleHow does CLIP process negation? A multimodal interpretability study
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id29198


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record