Identifying Historical Person Names using Weighted Edit Distance
Summary
In the process of automated record linkage, dealing with name variation is often done via limited means, such as an edit distance plus a threshold value.
However, names vary in ways that default similarity measures can not reliably coped with.
In an effort to overcome this threshold, an alternative, 'weighted' edit distance is proposed. This weighted edit distance would assign costs to operations based on previously seen operations that transform names into their known variants.
Names often vary in similar ways, by adding the same suffixes, to name an example.
Operations that transform names into their name variants are therefore likely to be similar to the operations that would be seen between names and their yet unseen name variants.
In this paper, methods are defined that gather the data required to create a cost model that assigns costs for the operations of a weighted edit distance.
Suggestions were then given on how to implement a cost model and a weighted edit distance based on this data, as well as how to test these implementations.