Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorOdijk, J.
dc.contributor.advisorVossen, P.
dc.contributor.authorPostma, M.C.
dc.date.accessioned2013-09-18T17:01:12Z
dc.date.available2013-09-18
dc.date.available2013-09-18T17:01:12Z
dc.date.issued2013
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/14874
dc.description.abstractMany tasks in the field of Natural Language Processing make use of so-called semantic similarity measures, which quantify the degree to which two concepts are semantically similar. In order to know which of the semantic similarity measures is to be used for Natural Language Processing tasks, they are generally evaluated against human judgement. However, because human judgement is subjective, gold standards are created by asking a group of people to indicate the similarity of meaning of a set of word pairs. The correlation between these gold standards and the output from the semantic similarity measures gives a good indication as to which measure correlates best with human judgement. Most research, for example Patwardhan and Pedersen (2006) and Peder- sen (2010), has focused on English, using the English lexical semantic database WordNet (Miller, 1995) to compute the scores for the semantic similarity mea- sures. The main focus of this thesis is upon getting a better understanding of the workings of semantic similarity measures by also using a diff erent lexi- cal semantic database in a di fferent language, which is Cornetto (Vossen, 2006; Vossen et al., 2007, 2008) for Dutch. In order to get a better understanding of these measures, we first inspect the previous English experiments and try to replicate them to be sure that we fully understand the process. Furthermore, we will create a Dutch gold standard and inspect the correlations between the output from the semantic similarity measures using the Dutch lexical semantic database Cornetto and the newly created Dutch gold standard. For English, we will show that a group of semantic similarity measures ap- proaches human judgement in a similar way. Moreover, we will stress the im- portance of addressing every detail of the process that leads to the results by showing that even if the main properties are kept stable, variations in minor properties can lead to completely diff erent outcomes. Furthermore, we will present our gold standard for Dutch and how it was created. In addition, we will show that not only the properties of a semantic similarity measure deter- mine its performance, but that the structure of the lexical semantic database also plays a crucial role
dc.description.sponsorshipUtrecht University
dc.format.extent1153349 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleWhat replication and localisation teach us: the case of semantic similarity measures
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordscomputational linguistics, semantic similarity measures, wordnet, cornetto, lexical semantic databases
dc.subject.courseuuLinguistics: the Study of the Language Faculty


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record