View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Automatic Detection of Linguistic Errors in Dutch LLM-Generated Text

        Thumbnail
        View/Open
        ads_masters_thesis_kevin_heijboer_publication.pdf (521.3Kb)
        Publication date
        2025
        Author
        Heijboer, Kevin
        Metadata
        Show full item record
        Summary
        As large language models (LLMs) are increasingly used to generate Dutch book descriptions, ensuring the linguistic quality of their output remains a challenge. This thesis explores whether real human edits can be used to train models that automatically detect linguistically unacceptable sentences. Using versioned summaries from Bookarang, a multi-step filtering pipeline was developed to extract only meaning-preserving linguistic edits, removing content and stylistic changes using sentence alignment, NLI filtering, and GPT-based classification. The result was a dataset of 12,894 labeled sentences for training acceptability classifiers. Transformer models were fine-tuned on this data, with Multilingual BERT achieving 74.3% recall, greatly outperforming a CoLA-NL-trained RobBERT baseline. Threshold tuning further allowed balancing error detection with editorial workload. These results show that edit data can be turned into useful training material through targeted filtering, offering a practical approach to improving quality control for LLM-generated content in real-world editorial settings.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49832
        Collections
        • Theses
        Utrecht university logo