Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorVerstegen, Judith
dc.contributor.authorKeus, David
dc.date.accessioned2023-09-28T00:01:00Z
dc.date.available2023-09-28T00:01:00Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/45241
dc.description.abstractSpeaker communities typically have some level of interaction and are not completely isolated. When individuals who speak different languages come into contact, it is probable that their respective languages undergo a process of convergence. Ranacher et al. (2021) have developed a method, sBayes, to estimate the relative role of language contact, as opposed to inheritance and universal preference, in creating similarities between languages. The model promises to identify contact areas from empirical data using (Bayesian) inference. However, validation of the approach proves difficult since they use em- pirical data of real-world language in which, by definition, actual contri- butions of language contact, inheritance and universal preference are not known. To further validate the sBayes model, a dataset is needed from which we know our expected descriptive contact, inheritance and universal prefer- ence values prior to the model run. This dataset can then be compared to the output of sBayes. For this purpose, we created synthetic language datasets using an agent- based model to test the accuracy of sBayes. Using these datasets we con- ducted two experiments, one to validate sBayes ability to detect isolated causal explanations per language feature. The second to test sBayes fit to an artificial language dataset and in determining language areas (clusters) and overall causality counts. Our results suggest that synthetic language data can successfully be used for validation purposes of the sBayes language model. sBayes accuracy on identifying clearly isolated causalities has a combined mean squared error of 0.05 in our simulations. In a simulated real life situation, the model find a similar amount of contact areas. In addition, the overall distribution of feature state causality is the same in our synthetic data when we compare it to a benchmark experiment.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectValidation of a Bayesian mixture model for language contact (sBayes) with the use of synthetic language data generated by an agent-based model.
dc.titleValidation of a Bayesian mixture model for language contact with the use of synthetic language data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsbayesian;sBayes;artificial;language;data;contact
dc.subject.courseuuApplied Data Science
dc.thesis.id24770


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record