Automated detection of positive/negative shyness in children from videos
Summary
Shyness in infancy has often been approached as a unidimensional, negative construct linked to social withdrawal. However, recent research introduces a nuanced perspective: shyness can also be positive and socially adaptive. This study investigates the automatic
detection of positive and non-positive shyness in infants using video analysis. To that end, we utilize a dataset of 12 and 15-month-old infants recorded in social interaction settings, annotated for shyness expressions. We employ state-of-the-art video representation models; VideoMAE, and VideoMamba to classify shyness expressions. Furthermore, we investigate how different regions of interest (head, body, whole scene), infant age, fine-tuning, and the incorporation of vision language models (VideoLLaVA) affect model performance. Our findings highlight that (1) attention to specific regions improves classification performance, (2) age impacts model efficacy, and (3) multimodal embeddings can slightly enhance shyness detection. This work contributes a novel framework for understanding the dynamics of infant emotion through automated analysis and lays groundwork for future applications in developmental research and human-robot interaction.