Enhancing Readability of Governmental Letters Using Large Language Models
Summary
Text simplification aims to make text more readable by reducing the linguistic complexity. This study explores the use of sequence-to-sequence transformer models to simplify
Dutch governmental letters, enhancing their readability for individuals with low literacy
levels. Various models, including T5-Small, BART-Base, mT5-Small, mBART-Large-50,
T5-Base-Dutch, UL2-Small-Dutch and UL2-Small-Dutch-Simplification, were trained on
datasets comprising complex and simplified Dutch sentences. These models were evaluated
using quantitative metrics such as the Flesch-Kincaid Grade Level, BLEU score and SARI
score, complemented by a qualitative analysis. The best-performing model was applied to
a dataset of letters provided by the Rijksdienst voor Ondernemend Nederland to produce
simplified versions. The study demonstrates that while the models slightly improve readability as indicated by Flesch-Kincaid scores, qualitative analysis reveals significant issues
with content preservation and coherence. This highlights the need for further refinement
to achieve the desired readability improvement, while maintaining accuracy in Dutch text
simplification.