What replication and localisation teach us: the case of semantic similarity measures

Postma, M.C.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Odijk, J.
dc.contributor.advisor	Vossen, P.
dc.contributor.author	Postma, M.C.
dc.date.accessioned	2013-09-18T17:01:12Z
dc.date.available	2013-09-18
dc.date.available	2013-09-18T17:01:12Z
dc.date.issued	2013
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/14874
dc.description.abstract	Many tasks in the field of Natural Language Processing make use of so-called semantic similarity measures, which quantify the degree to which two concepts are semantically similar. In order to know which of the semantic similarity measures is to be used for Natural Language Processing tasks, they are generally evaluated against human judgement. However, because human judgement is subjective, gold standards are created by asking a group of people to indicate the similarity of meaning of a set of word pairs. The correlation between these gold standards and the output from the semantic similarity measures gives a good indication as to which measure correlates best with human judgement. Most research, for example Patwardhan and Pedersen (2006) and Peder- sen (2010), has focused on English, using the English lexical semantic database WordNet (Miller, 1995) to compute the scores for the semantic similarity mea- sures. The main focus of this thesis is upon getting a better understanding of the workings of semantic similarity measures by also using a diff erent lexi- cal semantic database in a di fferent language, which is Cornetto (Vossen, 2006; Vossen et al., 2007, 2008) for Dutch. In order to get a better understanding of these measures, we first inspect the previous English experiments and try to replicate them to be sure that we fully understand the process. Furthermore, we will create a Dutch gold standard and inspect the correlations between the output from the semantic similarity measures using the Dutch lexical semantic database Cornetto and the newly created Dutch gold standard. For English, we will show that a group of semantic similarity measures ap- proaches human judgement in a similar way. Moreover, we will stress the im- portance of addressing every detail of the process that leads to the results by showing that even if the main properties are kept stable, variations in minor properties can lead to completely diff erent outcomes. Furthermore, we will present our gold standard for Dutch and how it was created. In addition, we will show that not only the properties of a semantic similarity measure deter- mine its performance, but that the structure of the lexical semantic database also plays a crucial role
dc.description.sponsorship	Utrecht University
dc.format.extent	1153349 bytes
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	What replication and localisation teach us: the case of semantic similarity measures
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	computational linguistics, semantic similarity measures, wordnet, cornetto, lexical semantic databases
dc.subject.courseuu	Linguistics: the Study of the Language Faculty

Files in this item

Name:: Marten_Postma.pdf
Size:: 1.099Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record