Translating Legal Texts to B1 Dutch Language Level
Summary
This research investigates the potential of large language models to transform the accessibility of
Dutch legal texts for B1-level readers, without sacrificing legal accuracy. Five models: GPT-4o, Claude
Sonnet 4, Gemini 1.5 Pro, UL2-T5 and a fine-tuned Meta-LLaMA-3.1-8B-Instruct are evaluated on a
dataset of legal summaries from voorRecht-rechtspraak. The evaluation pipeline integrates automatic
metrics (BERTScore, CEFR-based NT2Lex), an LLM-as-a-judge framework, and validation by both legal
and linguistic experts.
Results show that recent large language models, particularly Claude Sonnet 4 and GPT-4o, can
reliably produce simplified legal texts that are much more accessible to non-experts, while largely
maintaining the essential legal meaning and accuracy. The LLM-as-a-judge framework and expert
reviews both confirm strong performance across key criteria, highlighting significant progress in
automated legal simplification. Although occasional shortcomings persist, these findings
demonstrate that with further refinement, large language models have the potential to bridge the
gap between complex legal language and public understanding.