View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Domain-Specific Visual Representation Learning Using Natural Language Supervision

        Thumbnail
        View/Open
        Domain-Specific Visual Representation Learning Using Natural Language Supervision. Kern A.S.Y. 2022.pdf (1.853Mb)
        Publication date
        2022
        Author
        Kern, Alexander
        Metadata
        Show full item record
        Summary
        The military intelligence domain is one of many fields investigating deep learning methods to automate various processes, especially for the task of recognizing specific entities in large sets of images. Current state-of-the-art methods cannot be easily applied in the military domain since they require large sets of labelled images, which are challenging to acquire for the domain-specific classes. Recently, research has investigated the possibility of learning visual features with natural language supervision by using image captioning as a pre-training task for visual backbones. This study investigates the possibility of pre-training with domain-specific image-captions to learn domain-specific visual features. We pre-train convolutional neural networks from scratch, using a militaryspecific image-caption dataset (Janes Captions) collected for this study. The effect of different image captioning pre-training tasks on the learning of the visual features was also evaluated. Although these models did not outperform the current state-of-the-art methods, they outperformed models pre-trained on similar amounts of generic image-captions. Ultimately, natural language supervision for pre-training visual models is a promising concept that, if applied correctly, could solve the problems of current state-of-the-art methods, especially for application in specific domains.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/402
        Collections
        • Theses
        Utrecht university logo