Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorPieters, A.H.L.M.
dc.contributor.authorEngelberts, Sander
dc.date.accessioned2022-09-09T00:01:30Z
dc.date.available2022-09-09T00:01:30Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42370
dc.description.abstractEntity resolution on genealogical documents is challenging due to spelling errors, alternative name variants, and historic entity changes. Traditional methods attempt to tackle these problems with string similarity methods, which this research proposes to extend by enriching the recorded features with additional place information such as place URIs, coordinates, and country indicators. Based on a case study at the Dutch Centre for Genealogy, this research contributes to extending entity resolution research, optimizing and enriching family history (meta) studies, and investigating which privacy-sensitive passport request documents can be disclosed. First, linked open data sources are shown to retrieve unique place entities belonging to recorded place names. Second, place, province and country name similarities are calculated as well as coordinate distances within a coordinate reference system that limits the distance distortions for the respective countries. Third, the researched adaptation is shown to result in a significant change in similarity values when a uniform weighting of feature similarities is applied. However, contrary to the hypothesis, the similarity distributions of compared documents that do and do not refer to an equivalent person entity could not be distinguished in a more accurate way. Hence, future studies are proposed that expand on this research by supervised learning of weights and thresholds using validated candidate links from this research.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectEntity resolution on genealogical documents is challenging due to spelling errors, alternative name variants, and historic entity changes. Traditional methods attempt to tackle these problems with string similarity methods, which this research proposes to extend by enriching the recorded features with additional place information such as place URIs, coordinates, and country indicators. This research is based on a case study at the Dutch Centre for Genealogy with their real-world data.
dc.titleUtilizing linked open place data for entity resolution: a case study on Dutch genealogy at CBG
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsEntity resolution; Linked open data; Genealogy; Geospatial data; Content similarity
dc.subject.courseuuApplied Data Science
dc.thesis.id7703


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record