View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Identifying Historical Person Names using Weighted Edit Distance

        Thumbnail
        View/Open
        [KI] Eindwerkstuk Richard Oosterlaken.pdf (314.0Kb)
        Publication date
        2018
        Author
        Oosterlaken, R.A.J.
        Metadata
        Show full item record
        Summary
        In the process of automated record linkage, dealing with name variation is often done via limited means, such as an edit distance plus a threshold value. However, names vary in ways that default similarity measures can not reliably coped with. In an effort to overcome this threshold, an alternative, 'weighted' edit distance is proposed. This weighted edit distance would assign costs to operations based on previously seen operations that transform names into their known variants. Names often vary in similar ways, by adding the same suffixes, to name an example. Operations that transform names into their name variants are therefore likely to be similar to the operations that would be seen between names and their yet unseen name variants. In this paper, methods are defined that gather the data required to create a cost model that assigns costs for the operations of a weighted edit distance. Suggestions were then given on how to implement a cost model and a weighted edit distance based on this data, as well as how to test these implementations.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/30117
        Collections
        • Theses
        Utrecht university logo