Implementing Wav2Vec 2.0 into an Automated Reading Tutor

Mostert, Nick

View/Open

Thesis_Final_Version (1).pdf (2.125Mb)

Publication date

2024

Author

Mostert, Nick

Metadata

Show full item record

Summary

The number of low-literate adults in the Netherlands has been steadily increasing over the past decades. Research shows that proper reading instruction requires repeated individualized feedback. However, teachers often do not have the time or resources to provide this. Computer assisted reading tutors could provide a solution. Most current systems show good results at detecting word-level errors, but struggle to identify mispronunciations. Recent studies have shown that the use of large semi-supervised models like Wav2Vec 2.0 could improve the performance of mispronunciation detection models. The goal of this thesis is to research the effectiveness Wav2Vec 2.0 for the task of mispronunciation detection in Dutch children, and to implement it into an automated reading tutor. First, two types of Wav2Vec 2.0 models were created for classification of mispronunciation data from the speech therapy domain. Specifically, the task was target phone detection (TPD), where the pronunciation of each phone in a word is assessed individually. The first model performs end-to-end phonetic transcription, the second model uses pooling over the time dimension on the Wav2Vec 2.0 embeddings and then attempt to classify mispronunciations directly. Both of these models were then implemented into a reading error detection (RED) model to see whether the mispronunciation detection aspect of the RED model could be improved. For TPD, the models significantly improved over a baseline goodness of pronunciation (GOP) model. For RED, the use of Wav2Vec 2.0 lead to a small improvement for the classification of phone-level errors.

URI

https://studenttheses.uu.nl/handle/20.500.12932/46548

Collections

Theses