Utilizing linked open place data for entity resolution: a case study on Dutch genealogy at CBG
Summary
Entity resolution on genealogical documents is challenging due to spelling
errors, alternative name variants, and historic entity changes. Traditional
methods attempt to tackle these problems with string similarity methods,
which this research proposes to extend by enriching the recorded features
with additional place information such as place URIs, coordinates, and country indicators. Based on a case study at the Dutch Centre for Genealogy,
this research contributes to extending entity resolution research, optimizing and enriching family history (meta) studies, and investigating which
privacy-sensitive passport request documents can be disclosed.
First, linked open data sources are shown to retrieve unique place entities belonging to recorded place names. Second, place, province and country
name similarities are calculated as well as coordinate distances within a coordinate reference system that limits the distance distortions for the respective
countries. Third, the researched adaptation is shown to result in a significant
change in similarity values when a uniform weighting of feature similarities
is applied. However, contrary to the hypothesis, the similarity distributions
of compared documents that do and do not refer to an equivalent person
entity could not be distinguished in a more accurate way. Hence, future
studies are proposed that expand on this research by supervised learning of
weights and thresholds using validated candidate links from this research.