Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorHortensius, Ruud
dc.contributor.authorKern, Alexander
dc.date.accessioned2022-01-27T00:00:29Z
dc.date.available2022-01-27T00:00:29Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/402
dc.description.abstractThe military intelligence domain is one of many fields investigating deep learning methods to automate various processes, especially for the task of recognizing specific entities in large sets of images. Current state-of-the-art methods cannot be easily applied in the military domain since they require large sets of labelled images, which are challenging to acquire for the domain-specific classes. Recently, research has investigated the possibility of learning visual features with natural language supervision by using image captioning as a pre-training task for visual backbones. This study investigates the possibility of pre-training with domain-specific image-captions to learn domain-specific visual features. We pre-train convolutional neural networks from scratch, using a militaryspecific image-caption dataset (Janes Captions) collected for this study. The effect of different image captioning pre-training tasks on the learning of the visual features was also evaluated. Although these models did not outperform the current state-of-the-art methods, they outperformed models pre-trained on similar amounts of generic image-captions. Ultimately, natural language supervision for pre-training visual models is a promising concept that, if applied correctly, could solve the problems of current state-of-the-art methods, especially for application in specific domains.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectPre-training a visual model for a downstream domain-specific image classification task with a vision-language task, image captioning on a domain-specific dataset.
dc.titleDomain-Specific Visual Representation Learning Using Natural Language Supervision
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsrepresentation learning, deep-learning, pre-training, image-captioning, image classification
dc.subject.courseuuArtificial Intelligence
dc.thesis.id1968


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record