View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        An Empirical Evaluation of Convolutional and Recurrent Neural Networks for Lip Reading

        Thumbnail
        View/Open
        thesis.pdf (1.717Mb)
        Publication date
        2018
        Author
        Heimbach, K.B.
        Metadata
        Show full item record
        Summary
        The 3DCNN and the LSTM are both suited for video classification because of their ability to take into account temporal information. However, the two models do this in a very distinct manner. The aim of this work is to investigate which of the two models is better suited for automatic lip reading. Moreover, we also tested which model is better suited for transfer learning. We conducted two groups of experiments in this work. The first group consisted of experiments in which the two models were tested under several conditions in which the models were trained from scratch. The second group was conducted to determine which of the two models is better suited for transfer learning. We used a pretrained 3DCNN and LSTM from the first group of experiments to verify whether the accuracy of a model trained on a different dataset improved, compared to when it was trained from scratch. From the first group of experiments, we concluded that the 3DCNN is better suited for automatic lip reading because it achieves a higher test set accuracy than the LSTM. However, the 3DCNN takes a lot longer to train than the LSTM. From the second group of experiments, we can conclude that overall the 3DCNN is better suited for transfer learning. On the basis of all the experiments conducted, we conclude that overall the 3DCNN seems to be better suited for use in automatic lip reading in many different conditions.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/30698
        Collections
        • Theses
        Utrecht university logo