Multi-SimLex for Dutch: Comparing Embedding and Prompt-Based Model
Performance on Semantic Similarity

Brans, Lizzy

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Bylinina, Lisa
dc.contributor.author	Brans, Lizzy
dc.date.accessioned	2025-10-16T00:01:55Z
dc.date.available	2025-10-16T00:01:55Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/50561
dc.description.abstract	This study introduces a Dutch expansion of the Multi-SimLex dataset. This resource contains 1,888 word pairs annotated for semantic similarity by native Dutch speakers. The research evaluates 18 models using both embedding-based and prompt-based methods. Prompt-based evaluation produced the highest correlation with human judgments. GPT-4 achieved a correlation of 0.761. This suggests large generative models use dynamic reasoning. In contrast embedding-based evaluation favored smaller specialized models like FastText and BERTje. The findings underscore the importance of aligning evaluation strategy with the model's architecture. This study provides a foundational resource for Dutch semantics. It also suggests large language models could serve as a proxy for human ratings in the future.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	We expanded the Multi-SimLex dataset with 1,888 Dutch word pairs, annotated by native speakers for semantic similarity. We used this new dataset to evaluate 18 models, employing both embedding-based and prompt-based methods. While smaller, language-specific models like FastText and BERTje performed best in embedding-based tests, large generative models like GPT-4 excelled with prompt-based methods. This highlights the importance of matching the evaluation strategy to the model, as large models
dc.title	Multi-SimLex for Dutch: Comparing Embedding and Prompt-Based Model Performance on Semantic Similarity
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Lexical semantic similarity, Multi-SimLex dataset, computational models, embedding-based evaluation, prompt-based evaluation, dynamic reasoning
dc.subject.courseuu	Human-Computer Interaction
dc.thesis.id	54649

Files in this item

Name:: Thesis_Lizzy_Brans_1110209.pdf
Size:: 824.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

Multi-SimLex for Dutch: Comparing Embedding and Prompt-Based Model Performance on Semantic Similarity

Files in this item

This item appears in the following Collection(s)

Related items

Modeling dual-task performance: do individualized models predict dual-task performance better than average models? ﻿

Modelling Wastewater Quantity and Quality in Mexico -- using an agent-based model ﻿

Modelling offshore wind in the IMAGE/TIMER model ﻿

Modeling dual-task performance: do individualized models predict dual-task performance better than average models?

Modelling Wastewater Quantity and Quality in Mexico -- using an agent-based model

Modelling offshore wind in the IMAGE/TIMER model