Evaluating Dutch Social Bias in Large Language Models
Summary
Large Language Models (LLMs) are increasingly used by people in their daily lives. However, LLMs can contain biases and express these biases in their responses. One of these biases is social bias, on which this thesis is focused. Furthermore, one form of social bias is a disparate treatment of individuals based on characterizations such as age, gender and race. It is therefore crucial to explore the possible biases within LLMs and to raise awareness. This thesis builds on previous studies that investigate social bias in LLMs with a hiring decision setting. For this thesis, the decision the LLM has to make is whether someone is hired or not. LLMs are prompted with handwritten prompts in Dutch that look at both gender and country of origin. The responses of the LLMs are evaluated on Dutch social bias. This thesis finds that all tested models, gpt-4o-mini, claude-3.5-haiku, Geitje-7B-Ultra and EuroLLM-9B-Instruct, to some extent have social bias in their outputs. Furthermore, this thesis finds that all tested models to some extent are sensitive to the manner in which the prompts are written.