View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Dutch style model trained using authorship verification

        Thumbnail
        View/Open
        Dutch Style Representations.pdf (1.130Mb)
        Publication date
        2025
        Author
        Kole, Ruben
        Metadata
        Show full item record
        Summary
        Currently, there are no models trained specifically trained on the creation of representation of Dutch linguistic style. Neither has a task been developed to evaluate and verify the created embeddings. In this thesis, I construct a model that creates a style representation for Dutch and I create evaluation data to test if the created representation truly represents style. To create these embeddings, RobBERT-base is fine-tuned using the contrastive authorship verification task. To find the best-performing model, two datasets are constructed, and the loss function is experimented with as well as the value for the margin. The performance of the fine-tuned models falls in line with the results that are found in similar research for English style. For the evaluation, the STEL dataframe is adapted to a Dutch version. Some categories are copied from the English variant and translated to properly reflect Dutch style. Other categories are novel in this version. There are two versions of the STEL task, one of which controls for content to ensure that the embedding makes a decision based on style. The performance of the embeddings on the STEL task shows similarities to the results that are found in research into the English equivalent and shows that for most tasks the fine-tuned model learns to perform better on the tasks that control for style than the baseline model does. Therefore, this thesis concludes that it is possible to utilize methods devised for creating and evaluating English style representations and transform these into a Dutch version that show similar results as the original do
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/48450
        Collections
        • Theses
        Utrecht university logo