Multi-SimLex for Dutch: Comparing Embedding and Prompt-Based Model Performance on Semantic Similarity
Summary
This study introduces a Dutch expansion of the Multi-SimLex dataset. This resource contains 1,888 word pairs annotated for semantic similarity by native Dutch speakers. The research evaluates 18 models using both embedding-based and prompt-based methods. Prompt-based evaluation produced the highest correlation with human judgments. GPT-4 achieved a correlation of 0.761. This suggests large generative models use dynamic reasoning. In contrast embedding-based evaluation favored smaller specialized models like FastText and BERTje. The findings underscore the importance of aligning evaluation strategy with the model's architecture. This study provides a foundational resource for Dutch semantics. It also suggests large language models could serve as a proxy for human ratings in the future.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Modeling dual-task performance: do individualized models predict dual-task performance better than average models?
Cao, W. (2017)Understanding multitasking can be a complicated venture. The goal of this paper is to see whether using individual parameters for modeling dual-task will lead to better predictions of individual performance compared to ... -
Modelling Wastewater Quantity and Quality in Mexico -- using an agent-based model
Chen, Y. (2021)Wastewater is a key element in regional and global water circles, and the discharge of a large quantity of untreated wastewater is posing serious threats to the environment and public health in Mexico. To have a thorough ... -
Modelling offshore wind in the IMAGE/TIMER model
Gernaat, D.E.H.J. (2012)Current global energy consumption is expected to continue to grow as the global population is likely to increase towards 9 billion in 2050 while income levels per capita surge with 3-5% per year. Resource depletion, climate ...