Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorBloothooft, G.
dc.contributor.advisorSchraagen, M.
dc.contributor.authorKemenade, J. van
dc.date.accessioned2016-08-31T17:01:09Z
dc.date.available2016-08-31T17:01:09Z
dc.date.issued2016
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/23952
dc.description.abstractOne of the main problems in the field of record linkage is the variation in names. A possible approach for dealing with this variation is to remove name variation. To remove this variation each name in the historical records has to be converted to a base form. In this study a model is presented that can convert Dutch first names to their base form. To build this model a subset of a dataset containing 132.140 first names and their base form will be used to train three different multiclass classifiers: k Nearest Neighbours, Boosted Decision Trees and Support Vector Machines. Each of the classifiers is compared on accuracy, training time and classification speed. The best performing classifier, a boosted decision tree, is then selected for training and testing on the entire dataset. The final model is a boosted decision tree with a learning rate of 1.0 and 200 decision trees with a maximum depth of 17 levels. The validation error of the model, using 10-fold cross validation, is 84.56%. The accuracy of the final model on the test set, containing 24.576 names and 447 base forms, is 85.04% with a classification speed of more than 300 samples per second.
dc.description.sponsorshipUtrecht University
dc.format.extent799724
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleTraining a name-variant model using historical data
dc.type.contentBachelor Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsname variants, record linkage, multiclass classifcation
dc.subject.courseuuKunstmatige Intelligentie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record