Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorMosteiro Romero, Pablo
dc.contributor.authorSaakashvili, Eduard
dc.date.accessioned2024-04-08T23:02:24Z
dc.date.available2024-04-08T23:02:24Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46270
dc.description.abstractVision-language models (VLMs) are increasingly successful, but questions remain about the extent and nature of their grounding in the visual modality. Many prior approaches to this question tend to focus on either performance-based measures of grounding (what can a model do?) or comparisons between a model’s internal representations and a normative human baseline (is a model doing things in a humanlike way?). This study tests whether the results of each of these two approaches are correlated with one another in the context of a benchmark specifically designed to measure grounding. I design a human experimental environment to extract human saliency maps for a subset of the VALSE grounding benchmark. I also generate attribution maps for four VLMs for the same stimuli. My analysis creates a "humanlikeness" similarity metric for visual model attribution maps, and finds that model attribution maps are detectably "humanlike" on average. However, the degree of attribution humanlikeness does not correlate with model performance on the VALSE benchmark, either between or within models. The utility of this attribution-based humanlikeness metric as a complement to performance-based benchmarks remains unclear.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis study uses the SHAP XAI method in conjunction with human subject data gathering to compare attribution between humans and AI models on image-caption challenges. The study asks whether the degree of human-model similarity correlates with model performance.
dc.titleDo more ’humanlike’ vision-language models perform better on grounding challenges? An attribution-based study on the VALSE image-caption alignment benchmark
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsXAI, shapley, VLM, language models, transformers, explainability
dc.subject.courseuuArtificial Intelligence
dc.thesis.id29897


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record