Domain-Specific Visual Representation Learning Using Natural Language Supervision

Kern, Alexander

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Hortensius, Ruud
dc.contributor.author	Kern, Alexander
dc.date.accessioned	2022-01-27T00:00:29Z
dc.date.available	2022-01-27T00:00:29Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/402
dc.description.abstract	The military intelligence domain is one of many fields investigating deep learning methods to automate various processes, especially for the task of recognizing specific entities in large sets of images. Current state-of-the-art methods cannot be easily applied in the military domain since they require large sets of labelled images, which are challenging to acquire for the domain-specific classes. Recently, research has investigated the possibility of learning visual features with natural language supervision by using image captioning as a pre-training task for visual backbones. This study investigates the possibility of pre-training with domain-specific image-captions to learn domain-specific visual features. We pre-train convolutional neural networks from scratch, using a militaryspecific image-caption dataset (Janes Captions) collected for this study. The effect of different image captioning pre-training tasks on the learning of the visual features was also evaluated. Although these models did not outperform the current state-of-the-art methods, they outperformed models pre-trained on similar amounts of generic image-captions. Ultimately, natural language supervision for pre-training visual models is a promising concept that, if applied correctly, could solve the problems of current state-of-the-art methods, especially for application in specific domains.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Pre-training a visual model for a downstream domain-specific image classification task with a vision-language task, image captioning on a domain-specific dataset.
dc.title	Domain-Specific Visual Representation Learning Using Natural Language Supervision
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	representation learning, deep-learning, pre-training, image-captioning, image classification
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	1968

Files in this item

Name:: Domain-Specific Visual Represe ...
Size:: 1.853Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record