Synthesising 2D images of adult-child interaction for human pose estimation

Dekker, Martijn

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Salah, Albert
dc.contributor.author	Dekker, Martijn
dc.date.accessioned	2022-09-09T02:03:52Z
dc.date.available	2022-09-09T02:03:52Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/42581
dc.description.abstract	In the last few years, Human pose estimators have gotten better at predicting the pose of people, especially adults. Pose estimators still struggle with occlusions and the performance on children has not improved as much either due to a lack of child-specific pose data and child bodies having different proportions. Adult-child interaction is even more difficult as it has lots of occlusions and people with vastly different body sizes. This is unfortunate as it could potentially be really helpful. Human pose estimation can be applied in many areas, e.g., in human-computer interaction, healthcare, or behavioural sciences. In this research, I try to improve pose estimators’ performance on adult-child interactions by synthesising data of said interaction. Authors of other studies have tried to solve the a of pose data by synthesising pose data using motion capture data. I tried a different approach, synthesising data by adjusting 3D human models’ poses in Unity. The adult and child models are adjusted such that they interact. In total I created 40 different interaction scenes, which I used to create 40,571 2D images of the interaction. During the synthesising I diversified the aesthetics to create a varied set of images. Unity automatically added precise annotations, 2D and 3D keypoint locations a.o., such that I was able to use these images to finetune four state-of-the-art human pose estimators (HigherHRNet-W32, HigherHRNet-W48, HRNet-W48 and Stacked Hourglass). To finetune the models, I combined a subset of the synthesised images (27,042 images) with COCO training data and trained on the combination. I evaluated their performance on a Youth images test set for which I annotated 520 challenging images of adult-child interaction. These images are challenging due to occlusions, self-occlusions, people blending in with the background and keypoints falling outside of the camera bounds. I found that the models’ AP improved by 1.81, 0.72, 1.14 and 1.55 after finetuning while the AR performance improvements are larger, 2.52, 2.17, 1.35 and 1.25 respectively. These improvement show that motion capture data is not necessary for synthesising images that can improve pose estimators' performance on adult-child interaction data.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	The subject of my thesis is synthesising images of adult-child interaction which can be used to finetune existing human pose estimator models with this finetuning improving the pose estimator models' performance on a test set consisting of frames from the Youth dataset (in these videos a parent and child interact).
dc.title	Synthesising 2D images of adult-child interaction for human pose estimation
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	9696

Files in this item

Name:: Martijn_Dekker_Master_Thesis_F ...
Size:: 26.53Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record