Cue Classification in Preterm Infants Using Vision-Language Models

Boer, Yannick den

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Poppe, Ronald
dc.contributor.author	Boer, Yannick den
dc.date.accessioned	2025-08-21T00:06:37Z
dc.date.available	2025-08-21T00:06:37Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/49902
dc.description.abstract	Premature infants in the Neonatal Intensive Care Unit (NICU) require continuous monitoring to ensure responsive and personalized care. Behavioral cues such as facial expressions and hand movements offer non-invasive insights into an infant’s physiological states but are currently assessed manually. This thesis explores the feasibility of using Vision-Language Models (VLMs) to automate the classification of these cues from video recordings. Motivated by the challenges of data scarcity in clinical settings, the study leverages the pretrained knowledge embedded in VLMs through a zero-shot, text-based classification pipeline based on predefined linguistic prompts. Using multi-label annotations adapted from a small dataset of five preterm infants recorded at UMC Utrecht, three experiments are conducted: (1) an embedding space analysis to assess semantic clustering, (2) a prompt-guided text classification method evaluated to determine label presence, and (3) a prompt ensembling strategy to enhance robustness. Results indicate that while model performance remains below clinical applicability thresholds, VLMs can discern visually salient cues, especially when guided by well-designed prompts. These findings demonstrate the potential of VLMs for behavioral cue recognition in data-scarce settings and suggest directions to further enhance performance through soft prompting, multilabel classification, and domain adaptation.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	The thesis explores the feasibility of using VLMs to classify behavioral cues in preterm infants. Several experiments are performed, including embedding visualization, text-based classification and prompt ensembling. The research is motivated by the need for non-invasive monitoring of infant states such as sleep, hunger and pain
dc.title	Cue Classification in Preterm Infants Using Vision-Language Models
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	VLM; Classification; Preterm Infants; Computer Vision; Qwen; Prompting
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	51996

Files in this item

Name:: Msc_Thesis___Yannick_den_Boer_ ...
Size:: 3.414Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record