dc.description.abstract | Infant hand gestures such as pointing, showing, and giving are known predictors of early language development. However, automated recognition of infant gestures remains an underexplored task, particularly due to the scarcity of annotated infant
datasets and the unique variability in infant motor behavior. This thesis investigates the feasibility of Deep Learning (DL) models for infant hand gesture recognition using video data from the YOUth Cohort Study. Several model architectures have been evaluated, including a two-stream CNN coupled with an SVM classifier, 3D CNNs and
transformer-based approaches. These were either trained from scratch or pretrained on both general and gesture-specific datasets. Performance was assessed for both binary (gesture or no gesture) and multiclass (7 classes) classification. Top performance, in both cases, was achieved by end-to-end training on temporal features, scoring macro average F1 scores of 73.06% and 40.98% respectively. Furthermore, the study explored
the relationship between gesture frequencies and Peabody Picture Vocabulary Test (PPVT) scores using linear regression and Random Forests (RF). Child-related metadata, such as maternal education level, were also incorporated as predictor variables. Based on our limited dataset, no predictors of language development were found other than the age of a child at the time of PPVT testing. While classification results appear promising, the study on automated hand gesture recognition for early language development is still in its infancy. | |