View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        KINSHIP VERIFICATION USING VISION TRANSFORMERS

        Thumbnail
        View/Open
        KVViT 23-02-09.pdf (7.764Mb)
        Publication date
        2023
        Author
        Tang, Huaixi
        Metadata
        Show full item record
        Summary
        Kinship verification is the term of verifying whether the given two people have a kin relationship from their facial images or videos or other biological features. As a soft bio-metric modality, visual kinship verification has high availability and extremely low cost compared to DNA-based methods. It is a huge challenge to analyze kinship based on visual information, mainly because the kin relationship has s large intra-class differences and small inter-class differences due to factors such as gender and age. This requires us to extract more discriminative features. Video data can bring us a new dimension. Previous studies have shown that people with kinship not only have similar appearances but also have similar expression patterns, which suggests that we can extract dynamic features of facial videos for kinship verification. Traditional methods use handcraft features to extract dynamic features, and some new research begins to use neural networks. Our research focuses on smiling expressions, trying to extract spatio-temporal features from facial videos using a state-of-the-art video vision transformers. We created a video vision transformer based siamese network and trained it on a face video dataset. We experimentally compare the impact of using dynamic features versus purely texture features on kinship verification. We then compared the capabilities of CNNs and ViTs in extracting facial dynamic features. We tested the performance of the model by adjusting the initialization and training methods of the model. Referring to the latest research, we developed a pre-training method based on matched expression sequences to solve the challenge brings by the small size of the dataset. Our study is trained on smiling videos provided by the UvA-NEMO dataset and presents results and analytics.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/43547
        Collections
        • Theses
        Utrecht university logo