Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorÖnal Ertugrul, I.
dc.contributor.authorZenios, Sotiris
dc.date.accessioned2025-08-28T00:01:42Z
dc.date.available2025-08-28T00:01:42Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50028
dc.description.abstractEarly, unobtrusive detection of anxiety in children can facilitate timelier and more equitable access to mental-health interventions. This thesis developed and evaluated a deep-learning pipeline that predicts anxiety-related behaviours from recordings of parent–child interactions collected in the YOUth Cohort Study. The dataset comprised 100 dyads of nine-year-old children filmed in conflict and cooperative tasks. After automatic diarisation, four synchronised child-centred streams, which include facial expressions, body posture, speech acoustics and transcribed language were extracted. Strong unimodal baselines were first established with VideoMAE and FMAEIAT for vision, a CNN to LSTM pipeline for audio, and RobBERT for text, with the linguistic channel achieving the best single-modality score (F1 = 0.62). Building on these results, a Pairwise Cross-Modal Attention Network was introduced to learn explicit interactions between modalities. This architecture raised overall performance to F1 = 0.64, outperformed classic fusion techniques, and remained resilient when individual streams were noisy or absent. Ablation analyses showed that body-pose embeddings operate as a pivotal hub queried by the other modalities, while filtering out parental speech proved crucial for stable audio contributions. Beyond delivering a systematic benchmark for anxiety detection on the YOUth material, the findings reaffirm the diagnostic value of language yet demonstrate that carefully designed cross-modal attention can uncover complementary visual and acoustic cues that text alone misses. Although the current performance is not yet clinically sufficient, the research charts a clear path toward scalable, context-aware pre-screening tools for childhood anxiety and lays the groundwork for future extensions, including the use of VLLMs, alternative audio backbones, and dyadic modelling of parental behaviour.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectLeveraging multi modal deep-learning to predict anxiety symptoms from parent-child interactions
dc.titleLeveraging multi modal deep-learning to predict anxiety symptoms from parent-child interactions
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id52829


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record