Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSalah, Albert
dc.contributor.authorDekker, Martijn
dc.date.accessioned2022-09-09T02:03:52Z
dc.date.available2022-09-09T02:03:52Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42581
dc.description.abstractIn the last few years, Human pose estimators have gotten better at predicting the pose of people, especially adults. Pose estimators still struggle with occlusions and the performance on children has not improved as much either due to a lack of child-specific pose data and child bodies having different proportions. Adult-child interaction is even more difficult as it has lots of occlusions and people with vastly different body sizes. This is unfortunate as it could potentially be really helpful. Human pose estimation can be applied in many areas, e.g., in human-computer interaction, healthcare, or behavioural sciences. In this research, I try to improve pose estimators’ performance on adult-child interactions by synthesising data of said interaction. Authors of other studies have tried to solve the a of pose data by synthesising pose data using motion capture data. I tried a different approach, synthesising data by adjusting 3D human models’ poses in Unity. The adult and child models are adjusted such that they interact. In total I created 40 different interaction scenes, which I used to create 40,571 2D images of the interaction. During the synthesising I diversified the aesthetics to create a varied set of images. Unity automatically added precise annotations, 2D and 3D keypoint locations a.o., such that I was able to use these images to finetune four state-of-the-art human pose estimators (HigherHRNet-W32, HigherHRNet-W48, HRNet-W48 and Stacked Hourglass). To finetune the models, I combined a subset of the synthesised images (27,042 images) with COCO training data and trained on the combination. I evaluated their performance on a Youth images test set for which I annotated 520 challenging images of adult-child interaction. These images are challenging due to occlusions, self-occlusions, people blending in with the background and keypoints falling outside of the camera bounds. I found that the models’ AP improved by 1.81, 0.72, 1.14 and 1.55 after finetuning while the AR performance improvements are larger, 2.52, 2.17, 1.35 and 1.25 respectively. These improvement show that motion capture data is not necessary for synthesising images that can improve pose estimators' performance on adult-child interaction data.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe subject of my thesis is synthesising images of adult-child interaction which can be used to finetune existing human pose estimator models with this finetuning improving the pose estimator models' performance on a test set consisting of frames from the Youth dataset (in these videos a parent and child interact).
dc.titleSynthesising 2D images of adult-child interaction for human pose estimation
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id9696


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record