Automatic Annotation of Dutch Educational Assessment Questions using Large Language Models

Lopes Motoki, Isabela

View/Open

ADS_thesis_project_IsabelaMotoki.pdf (718.0Kb)

Publication date

2025

Author

Lopes Motoki, Isabela

Metadata

Show full item record

Summary

This study is aimed at the automatic evaluation of curriculum alignment. Curriculum alignment refers to the extent to which learning objectives, instructional activities, and assessments are coherently aligned. Traditionally, measuring this alignment is a time-consuming and often subjective process, since it typically involves evaluating all educational materials with the learning objectives of the curriculum. To address this, the research explores the use of large language models (LLMs) to automate the annotation of Dutch assessment questions with subject-specific concepts. Specifically, it investigates both generative (GPT-4.1 nano) and non-generative (mBERT) models using a labeled dataset of Dutch statistics questions. Results indicate that LLMs show strong potential in this domain: GPT achieved up to 71.1% accuracy and 62.2% macro F1 score, while mBERT reached 91.7% accuracy and 83.7% macro F1 score. Additionally, prompt engineering significantly enhances GPT’s performance, leading to substantial gains. The findings also highlight the importance of careful adaptation and evaluation across diverse educational contexts and task types, as performance varied depending on question categories and subject matter. This research contributes to the integration of AI in education by providing an effective solution for question annotation and offering insights into which approaches are better suited for different educational scenarios. As a result, educators can better align assessments with learning objectives and enhance the overall learning experience.

URI

https://studenttheses.uu.nl/handle/20.500.12932/49825

Collections

Theses