View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Do more ’humanlike’ vision-language models perform better on grounding challenges? An attribution-based study on the VALSE image-caption alignment benchmark

        Thumbnail
        View/Open
        Saakashvili_MSc_thesis.pdf (18.98Mb)
        Publication date
        2024
        Author
        Saakashvili, Eduard
        Metadata
        Show full item record
        Summary
        Vision-language models (VLMs) are increasingly successful, but questions remain about the extent and nature of their grounding in the visual modality. Many prior approaches to this question tend to focus on either performance-based measures of grounding (what can a model do?) or comparisons between a model’s internal representations and a normative human baseline (is a model doing things in a humanlike way?). This study tests whether the results of each of these two approaches are correlated with one another in the context of a benchmark specifically designed to measure grounding. I design a human experimental environment to extract human saliency maps for a subset of the VALSE grounding benchmark. I also generate attribution maps for four VLMs for the same stimuli. My analysis creates a "humanlikeness" similarity metric for visual model attribution maps, and finds that model attribution maps are detectably "humanlike" on average. However, the degree of attribution humanlikeness does not correlate with model performance on the VALSE benchmark, either between or within models. The utility of this attribution-based humanlikeness metric as a complement to performance-based benchmarks remains unclear.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46270
        Collections
        • Theses
        Utrecht university logo