The "Sprekend Nederland" project applied to accent location
Summary
In this thesis, two versions of an accent location system are introduced which have been built using data from the "Sprekend Nederland" project. These include short recordings of participants' speech, judgements about other participants' speech and meta-data such as the participants’ locations. This thesis aims at exploring the questions of how an accent location system can be implemented with the given data and what performance can eventually be achieved. Both versions of the system were implemented as feed-forward neural networks taking i-vectors as their input which had previously been extracted from the recordings. The first version was built to classify recordings as corresponding to one out of twelve accent regions whereas the second version predicted coordinates of the speakers’ locations.
Major challenges faced during engineering the system were due to dealing with an unbalanced dataset which was especially dominated by young participants living in larger cities such as Rotterdam, Amsterdam and Utrecht. Furthermore, participants' self-reporting, judgements of fellow participants, and the results of a principal component analysis of the used features, all indicated that the recordings would contain just little speech with local accents. Building an accent location system on these data consequently required exploiting scarce cues of these accents.
Both versions of the system showed a performance which was just little above chance level. However, human listeners who had been asked to guess participants' locations based on the same recordings could overall not outperform the system. While acknowledging further potential improvements, this led to the conclusion that most of the available cues of local accentedness must have been exploited during the system's training.