Simplifying letters from RVO using transformers without changing the semantic meaning

Hensman, Johan

View/Open

ADS_Master_Thesis_Johan_Hensman.docx (474.5Kb)

Publication date

2024

Author

Hensman, Johan

Metadata

Show full item record

Summary

In the Netherlands, government letters are sometimes written with such intricate language that the public might find them difficult to understand, resulting in a need for simplification of these letters. Because of the growth in Natural Language Processing (NLP) tasks with the transformer architecture, this study explores the possibility of lowering the difficulty of the letters from Rijksdienst voor Ondernemend Nederland (RVO) with transformers without changing the semantic meaning of the letters. To this end, this research has performed several steps, including finding the characteristics that contributed to the difficulty by using the tool LiNT, conducting a small exploratory research to identify the best approach for text simplification and performing an analysis of the original and the simplified letters. This research unveiled that the characteristic “unknown words” was the main contributor to the difficulty of the letters. For more challenging letters, the characteristics “length of clauses” and “grammatical dependencies” also affected the difficulty. From the exploratory research, it followed that Fietje-2 outperformed other models with the approach “few-shot prompting”. After performing this approach on all letters from RVO, the analysis of the original and simplified letters showed that a majority of the letters simplified slightly. The model simplified the letters by decreasing scores for the characteristics “unknown words” and “length of clauses”. These adjustments were mostly attributed to the examples given in the few-shot prompting. However, this was also the weakness of the model, as it was responsible for the minor increase in difficulty as well for a small portion of the letters. Thus, future research can focus on curating more appropriate examples for this approach.

URI

https://studenttheses.uu.nl/handle/20.500.12932/47681

Collections

Theses