dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Poppe, Ronald | |
dc.contributor.author | Boer, Yannick den | |
dc.date.accessioned | 2025-08-21T00:06:37Z | |
dc.date.available | 2025-08-21T00:06:37Z | |
dc.date.issued | 2025 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/49902 | |
dc.description.abstract | Premature infants in the Neonatal Intensive Care Unit (NICU) require continuous monitoring to ensure responsive and personalized care. Behavioral cues such as facial expressions and hand movements offer non-invasive insights into an infant’s physiological states but are currently assessed manually. This thesis explores the feasibility of using Vision-Language Models (VLMs) to automate the classification of these cues from video recordings.
Motivated by the challenges of data scarcity in clinical settings, the study leverages the pretrained knowledge embedded in VLMs through a zero-shot, text-based classification pipeline based on predefined linguistic prompts. Using multi-label annotations adapted from a small dataset of five preterm infants recorded at UMC Utrecht, three experiments are conducted: (1) an embedding space analysis to assess semantic clustering, (2) a prompt-guided text classification method evaluated to determine label presence, and (3) a prompt ensembling strategy to enhance robustness.
Results indicate that while model performance remains below clinical applicability thresholds, VLMs can discern
visually salient cues, especially when guided by well-designed prompts. These findings demonstrate the potential of VLMs for behavioral cue recognition in data-scarce settings and suggest directions to further enhance performance through soft prompting, multilabel classification, and domain adaptation. | |
dc.description.sponsorship | Utrecht University | |
dc.language.iso | EN | |
dc.subject | The thesis explores the feasibility of using VLMs to classify behavioral cues in preterm infants. Several experiments are performed, including embedding visualization, text-based classification and prompt ensembling. The research is motivated by the need for non-invasive monitoring of infant states such as sleep, hunger and pain | |
dc.title | Cue Classification in Preterm Infants Using Vision-Language Models | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | VLM; Classification; Preterm Infants; Computer Vision; Qwen; Prompting | |
dc.subject.courseuu | Artificial Intelligence | |
dc.thesis.id | 51996 | |