View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Cue Classification in Preterm Infants Using Vision-Language Models

        Thumbnail
        View/Open
        Msc_Thesis___Yannick_den_Boer___02_07_2025.pdf (3.414Mb)
        Publication date
        2025
        Author
        Boer, Yannick den
        Metadata
        Show full item record
        Summary
        Premature infants in the Neonatal Intensive Care Unit (NICU) require continuous monitoring to ensure responsive and personalized care. Behavioral cues such as facial expressions and hand movements offer non-invasive insights into an infant’s physiological states but are currently assessed manually. This thesis explores the feasibility of using Vision-Language Models (VLMs) to automate the classification of these cues from video recordings. Motivated by the challenges of data scarcity in clinical settings, the study leverages the pretrained knowledge embedded in VLMs through a zero-shot, text-based classification pipeline based on predefined linguistic prompts. Using multi-label annotations adapted from a small dataset of five preterm infants recorded at UMC Utrecht, three experiments are conducted: (1) an embedding space analysis to assess semantic clustering, (2) a prompt-guided text classification method evaluated to determine label presence, and (3) a prompt ensembling strategy to enhance robustness. Results indicate that while model performance remains below clinical applicability thresholds, VLMs can discern visually salient cues, especially when guided by well-designed prompts. These findings demonstrate the potential of VLMs for behavioral cue recognition in data-scarce settings and suggest directions to further enhance performance through soft prompting, multilabel classification, and domain adaptation.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49902
        Collections
        • Theses
        Utrecht university logo