Towards Increasing Robustness Against Occlusions for Preterm Infant Pose Estimation in Videos
Summary
Preterm Infant birth rates are rising globally; the causes and implications are yet not fully understood. However, it is clear that preterm infants are more likely to develop a myriad of developmental disorders in comparison to full-term infants. Given the sensitive nature of these infants, they require extensive monitoring and supervision. This monitoring is often performed in Neonatal Intensive Care Units (NICUs). Current techniques for monitoring infant activity are obtrusive as they require the use of needles and electrodes which can be painful or uncomfortable for the preterm infants. Recently, there has been a surge in unobtrusive monitoring techniques, in particular video-based approaches. These approaches rely on behavioral signals present in video which can be captured by estimating the pose and motion of the infants. Current SOTA systems rely on models trained predominantly on adults to estimate the pose and motion of infants, which lead to significantly worse performance scores in down-stream tasks. Additionally, infant data is often extremely hard to collect and of low quality, with poor lighting conditions, severe perspective distortions and occlusions; this lack of data makes it hard to train deep-learning models which are known to be data reliant. Due to the low quantity of data available, this research created the Synthetic (and real) Preterm Infant Sequences (SPIS) dataset by leveraging a SMIL, a vertex-based statistical volumetric model, and SMPLify, a 2D-to-3D lifting approach to create augmented sequences of synthetic infants. This dataset was then used to train the Preterm Infant Pose Estimator (PIPE) model for infant motion modeling. Additionally, to tackle the challenges of occlusions in the preterm infant domain an occlusion augmentation module for Temporal Convolutional Networks was developed. An ablation study was performed in order to validate the performance of the PIPE for medical applications. This work identified that the occlusion augmentation technique was not sufficient for the preterm infant domain. Additionally, the results indicated that the fine tuning the pose estimation and temporal convolutional networks to preterm infant motion improved the performance of the PIPE architecture significantly, indicating that further work into the preterm infant domain for pose estimation is required. Overall, the PIPE architecture did not achieve results that were sufficient enough to be used for subsequent tasks in the preterm infant domain.