Identifying Historical Person Names using Weighted Edit Distance

Oosterlaken, R.A.J.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Bloothooft, G.
dc.contributor.advisor	Feelders, A.J.
dc.contributor.author	Oosterlaken, R.A.J.
dc.date.accessioned	2018-08-03T17:01:32Z
dc.date.available	2018-08-03T17:01:32Z
dc.date.issued	2018
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/30117
dc.description.abstract	In the process of automated record linkage, dealing with name variation is often done via limited means, such as an edit distance plus a threshold value. However, names vary in ways that default similarity measures can not reliably coped with. In an effort to overcome this threshold, an alternative, 'weighted' edit distance is proposed. This weighted edit distance would assign costs to operations based on previously seen operations that transform names into their known variants. Names often vary in similar ways, by adding the same suffixes, to name an example. Operations that transform names into their name variants are therefore likely to be similar to the operations that would be seen between names and their yet unseen name variants. In this paper, methods are defined that gather the data required to create a cost model that assigns costs for the operations of a weighted edit distance. Suggestions were then given on how to implement a cost model and a weighted edit distance based on this data, as well as how to test these implementations.
dc.description.sponsorship	Utrecht University
dc.format.extent	321546
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Identifying Historical Person Names using Weighted Edit Distance
dc.type.content	Bachelor Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	onomastics; record linkage; weighted edit distance; dynamic costs
dc.subject.courseuu	Kunstmatige Intelligentie

Files in this item

Name:: [KI] Eindwerkstuk Richard ...
Size:: 314.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record