View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Training a name-variant model using historical data

        Thumbnail
        View/Open
        thesis.pdf (780.9Kb)
        Publication date
        2016
        Author
        Kemenade, J. van
        Metadata
        Show full item record
        Summary
        One of the main problems in the field of record linkage is the variation in names. A possible approach for dealing with this variation is to remove name variation. To remove this variation each name in the historical records has to be converted to a base form. In this study a model is presented that can convert Dutch first names to their base form. To build this model a subset of a dataset containing 132.140 first names and their base form will be used to train three different multiclass classifiers: k Nearest Neighbours, Boosted Decision Trees and Support Vector Machines. Each of the classifiers is compared on accuracy, training time and classification speed. The best performing classifier, a boosted decision tree, is then selected for training and testing on the entire dataset. The final model is a boosted decision tree with a learning rate of 1.0 and 200 decision trees with a maximum depth of 17 levels. The validation error of the model, using 10-fold cross validation, is 84.56%. The accuracy of the final model on the test set, containing 24.576 names and 447 base forms, is 85.04% with a classification speed of more than 300 samples per second.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/23952
        Collections
        • Theses
        Utrecht university logo