Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorPoppe, Ronald
dc.contributor.authorAlexandrou, Andreas
dc.date.accessioned2025-08-21T00:06:48Z
dc.date.available2025-08-21T00:06:48Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/49905
dc.description.abstractUnderstanding early parent–infant interactions, particularly those involving physical touch, is vital for assessing developmental progress and emotional bonding. Recent AI-based systems have shown promising results in automating physical contact detection; however, most of them rely on single-view input and are highly liable to occlusion, limiting their effectiveness in real-world settings. This thesis aimed to address that gap by extending the Image2Contact framework to incorporate multi-view input for improved contact prediction during free-play interactions. Several fusion approaches were implemented, including early feature concatenation, statistical decision fusion, logit-level fusion with fully connected heads, and attention-based feature fusion. All models were trained and evaluated on the YOUth PCI dataset, which includes real-world, multi-camera recordings of parent–infant interactions. The results showed that multi-view models consistently outperformed single-view baselines, though the improvements were often modest. To understand this limited performance gain, the analysis examined the influence of pose confidence, contact density, occlusion, and body-region specificity on model behaviour. It revealed that prediction success was strongly driven by physical contact density and the frequency of body-part involvement. Dense-contact scenes yielded better model performance, while sparse-contact frames remained challenging. One important finding was that the model’s tendency to over-predict was caused by an inconsistency between the evaluation metric and the loss function that was used during training. This led the models to overestimate contact in order to avoid harsh penalties for missed true positives. Although both single-view and multi-view models were affected, multi-view architectures better handled low-contact frames by leveraging cross-view spatial cues to make more precise predictions. Regionaware loss functions and threshold calibration improved single-view performance, in some cases matching multi-view results. However, multi-view models saw limited benefit from these adjustments. These findings shed light on the factors that most influence performance and advance our understanding of the limitations and potentials of single-view and multi-view systems in contact signature prediction for parent-infant interactions. Furthermore, this study suggests that future systems should prioritize data quality, employ region-aware, class-balanced loss functions aligned with loss and evaluation metrics, and incorporate more sophisticated architectural designs to achieve more precise predictions.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis thesis aimed to address that gap by extending the Image2Contact framework to incorporate multi-view input for improved contact prediction during free-play interactions. Several fusion approaches were implemented, including early feature concatenation, statistical decision fusion, logit-level fusion with fully connected heads, and attention-based feature fusion.
dc.titleEnhancing Human Contact Signatures Estimation Through Multi-View Integration in Complex Parent-Infant Free Play Interactions
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id52001


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record