View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Clustering in a Phonotactic Maximum Entropy Model

        Thumbnail
        View/Open
        Scriptie-met-bijlages.zip (171.2Kb)
        Publication date
        2021
        Author
        Kooij, G.
        Metadata
        Show full item record
        Summary
        How learners of a language, such as children, acquire knowledge of a language has always been a big question within linguistics. Often, to model this, a maximum entropy model is used. Nazarov (2016) and Mayer (2018) showed that it was possible to learn phonotactic phenomena without a priori knowledge. These studies, however, did not consider the effect of clustering on accuracy. The goal of this research is to measure the effect of clustering on a phonotactic maximum entropy model. The main question of this research is whether clustering in a phonotactic maximum entropy model, which learns a language, improves the performance of the model, compared to a model without clustering. To answer this question, a model is created. This model first creates constraints using clusters of classes of phones. Next, a maximum entropy model is used to weigh these constraints. Finally, using these weights, the model predicts the probability of words in the language. This is compared to the actual probability and evaluated. The model also compares different models with and without clustering. The models with clustering turned out to perform better than the models without clustering. From this, it could be inferred that clustering improved the performance of the model. This model was run using a made-up 'toy' language. Possible follow up research could take a look at a real language, with a bigger dataset.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/40696
        Collections
        • Theses
        Utrecht university logo