Reconstructing families
Summary
Edit distance is often used in record linkage for real persons to express the similarity of two names. In
historical data names often have high spelling variance. This study investigates a method to deal with
high name spelling variance by using overlinking and ?ltering in order to generate matches on a dataset
of historical civil registrations. The method tries to build sets of registrations of persons that belong
to the same family by applying real world knowledge to the generated matches. When using the four
names of the parents mentioned on the registrations and an edit distance of 4 and 5, 80% to 85% of the
generated matches are consistent with real world knowledge.