Implementing Wav2Vec 2.0 into an Automated Reading Tutor

Mostert, Nick

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Schraagen, Marijn
dc.contributor.author	Mostert, Nick
dc.date.accessioned	2024-07-02T12:27:54Z
dc.date.available	2024-07-02T12:27:54Z
dc.date.issued	2024
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/46548
dc.description.abstract	The number of low-literate adults in the Netherlands has been steadily increasing over the past decades. Research shows that proper reading instruction requires repeated individualized feedback. However, teachers often do not have the time or resources to provide this. Computer assisted reading tutors could provide a solution. Most current systems show good results at detecting word-level errors, but struggle to identify mispronunciations. Recent studies have shown that the use of large semi-supervised models like Wav2Vec 2.0 could improve the performance of mispronunciation detection models. The goal of this thesis is to research the effectiveness Wav2Vec 2.0 for the task of mispronunciation detection in Dutch children, and to implement it into an automated reading tutor. First, two types of Wav2Vec 2.0 models were created for classification of mispronunciation data from the speech therapy domain. Specifically, the task was target phone detection (TPD), where the pronunciation of each phone in a word is assessed individually. The first model performs end-to-end phonetic transcription, the second model uses pooling over the time dimension on the Wav2Vec 2.0 embeddings and then attempt to classify mispronunciations directly. Both of these models were then implemented into a reading error detection (RED) model to see whether the mispronunciation detection aspect of the RED model could be improved. For TPD, the models significantly improved over a baseline goodness of pronunciation (GOP) model. For RED, the use of Wav2Vec 2.0 lead to a small improvement for the classification of phone-level errors.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	I used Wav2Vec 2.0 to detect phonetic mispronunciations and implemented these models into a larger model that detects reading errors of young children reading texts out loud.
dc.title	Implementing Wav2Vec 2.0 into an Automated Reading Tutor
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	31982

Files in this item

Name:: Thesis_Final_Version (1).pdf
Size:: 2.125Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record