View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Enhancing Human Contact Signatures Estimation Through Multi-View Integration in Complex Parent-Infant Free Play Interactions

        Thumbnail
        View/Open
        Final Thesis Andreas Alexandrou.pdf (7.076Mb)
        Publication date
        2025
        Author
        Alexandrou, Andreas
        Metadata
        Show full item record
        Summary
        Understanding early parent–infant interactions, particularly those involving physical touch, is vital for assessing developmental progress and emotional bonding. Recent AI-based systems have shown promising results in automating physical contact detection; however, most of them rely on single-view input and are highly liable to occlusion, limiting their effectiveness in real-world settings. This thesis aimed to address that gap by extending the Image2Contact framework to incorporate multi-view input for improved contact prediction during free-play interactions. Several fusion approaches were implemented, including early feature concatenation, statistical decision fusion, logit-level fusion with fully connected heads, and attention-based feature fusion. All models were trained and evaluated on the YOUth PCI dataset, which includes real-world, multi-camera recordings of parent–infant interactions. The results showed that multi-view models consistently outperformed single-view baselines, though the improvements were often modest. To understand this limited performance gain, the analysis examined the influence of pose confidence, contact density, occlusion, and body-region specificity on model behaviour. It revealed that prediction success was strongly driven by physical contact density and the frequency of body-part involvement. Dense-contact scenes yielded better model performance, while sparse-contact frames remained challenging. One important finding was that the model’s tendency to over-predict was caused by an inconsistency between the evaluation metric and the loss function that was used during training. This led the models to overestimate contact in order to avoid harsh penalties for missed true positives. Although both single-view and multi-view models were affected, multi-view architectures better handled low-contact frames by leveraging cross-view spatial cues to make more precise predictions. Regionaware loss functions and threshold calibration improved single-view performance, in some cases matching multi-view results. However, multi-view models saw limited benefit from these adjustments. These findings shed light on the factors that most influence performance and advance our understanding of the limitations and potentials of single-view and multi-view systems in contact signature prediction for parent-infant interactions. Furthermore, this study suggests that future systems should prioritize data quality, employ region-aware, class-balanced loss functions aligned with loss and evaluation metrics, and incorporate more sophisticated architectural designs to achieve more precise predictions.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49905
        Collections
        • Theses
        Utrecht university logo