View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Implementing Wav2Vec 2.0 into an Automated Reading Tutor

        Thumbnail
        View/Open
        Thesis_Final_Version (1).pdf (2.125Mb)
        Publication date
        2024
        Author
        Mostert, Nick
        Metadata
        Show full item record
        Summary
        The number of low-literate adults in the Netherlands has been steadily increasing over the past decades. Research shows that proper reading instruction requires repeated individualized feedback. However, teachers often do not have the time or resources to provide this. Computer assisted reading tutors could provide a solution. Most current systems show good results at detecting word-level errors, but struggle to identify mispronunciations. Recent studies have shown that the use of large semi-supervised models like Wav2Vec 2.0 could improve the performance of mispronunciation detection models. The goal of this thesis is to research the effectiveness Wav2Vec 2.0 for the task of mispronunciation detection in Dutch children, and to implement it into an automated reading tutor. First, two types of Wav2Vec 2.0 models were created for classification of mispronunciation data from the speech therapy domain. Specifically, the task was target phone detection (TPD), where the pronunciation of each phone in a word is assessed individually. The first model performs end-to-end phonetic transcription, the second model uses pooling over the time dimension on the Wav2Vec 2.0 embeddings and then attempt to classify mispronunciations directly. Both of these models were then implemented into a reading error detection (RED) model to see whether the mispronunciation detection aspect of the RED model could be improved. For TPD, the models significantly improved over a baseline goodness of pronunciation (GOP) model. For RED, the use of Wav2Vec 2.0 lead to a small improvement for the classification of phone-level errors.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46548
        Collections
        • Theses
        Utrecht university logo