Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSchraagen, Marijn
dc.contributor.authorMostert, Nick
dc.date.accessioned2024-07-02T12:27:54Z
dc.date.available2024-07-02T12:27:54Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46548
dc.description.abstractThe number of low-literate adults in the Netherlands has been steadily increasing over the past decades. Research shows that proper reading instruction requires repeated individualized feedback. However, teachers often do not have the time or resources to provide this. Computer assisted reading tutors could provide a solution. Most current systems show good results at detecting word-level errors, but struggle to identify mispronunciations. Recent studies have shown that the use of large semi-supervised models like Wav2Vec 2.0 could improve the performance of mispronunciation detection models. The goal of this thesis is to research the effectiveness Wav2Vec 2.0 for the task of mispronunciation detection in Dutch children, and to implement it into an automated reading tutor. First, two types of Wav2Vec 2.0 models were created for classification of mispronunciation data from the speech therapy domain. Specifically, the task was target phone detection (TPD), where the pronunciation of each phone in a word is assessed individually. The first model performs end-to-end phonetic transcription, the second model uses pooling over the time dimension on the Wav2Vec 2.0 embeddings and then attempt to classify mispronunciations directly. Both of these models were then implemented into a reading error detection (RED) model to see whether the mispronunciation detection aspect of the RED model could be improved. For TPD, the models significantly improved over a baseline goodness of pronunciation (GOP) model. For RED, the use of Wav2Vec 2.0 lead to a small improvement for the classification of phone-level errors.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectI used Wav2Vec 2.0 to detect phonetic mispronunciations and implemented these models into a larger model that detects reading errors of young children reading texts out loud.
dc.titleImplementing Wav2Vec 2.0 into an Automated Reading Tutor
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id31982


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record