View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Utilizing linked open place data for entity resolution: a case study on Dutch genealogy at CBG

        Thumbnail
        View/Open
        MSc_Thesis_S-Engelberts_final.pdf (537.0Kb)
        Publication date
        2022
        Author
        Engelberts, Sander
        Metadata
        Show full item record
        Summary
        Entity resolution on genealogical documents is challenging due to spelling errors, alternative name variants, and historic entity changes. Traditional methods attempt to tackle these problems with string similarity methods, which this research proposes to extend by enriching the recorded features with additional place information such as place URIs, coordinates, and country indicators. Based on a case study at the Dutch Centre for Genealogy, this research contributes to extending entity resolution research, optimizing and enriching family history (meta) studies, and investigating which privacy-sensitive passport request documents can be disclosed. First, linked open data sources are shown to retrieve unique place entities belonging to recorded place names. Second, place, province and country name similarities are calculated as well as coordinate distances within a coordinate reference system that limits the distance distortions for the respective countries. Third, the researched adaptation is shown to result in a significant change in similarity values when a uniform weighting of feature similarities is applied. However, contrary to the hypothesis, the similarity distributions of compared documents that do and do not refer to an equivalent person entity could not be distinguished in a more accurate way. Hence, future studies are proposed that expand on this research by supervised learning of weights and thresholds using validated candidate links from this research.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/42370
        Collections
        • Theses
        Utrecht university logo