Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSiebes, A.P.J.M.
dc.contributor.advisorFeelders, A.J.
dc.contributor.authorBrandenburg, M.
dc.date.accessioned2017-08-23T17:01:51Z
dc.date.available2017-08-23T17:01:51Z
dc.date.issued2017
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/26997
dc.description.abstractThe large databases of government agencies are an interesting source for analyzing. Using government data, tax evaders might be found, and people could be called in for medical testing more or less often based on their data. This thesis project is an effort to use text mining for the classification of police reports. The goal was to train a model based on text mining for correct classification of three classes of online crime: online threat, online distribution of sexually obscene imagery, and computer trespass. To this end different approaches for building a model were compared. Preprocessing steps and model options included linguistic preprocessing, multiple methods of feature construction and selection, boosting and resampling. Four different algorithms were compared: Naive Bayes, Random Forest, SVM and XGBoost. The resulting models are promising, with F-score increases from random classification by factor 4-11.
dc.description.sponsorshipUtrecht University
dc.format.extent1044262
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleText Classification of Dutch police records
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsmachine learning, data mining, text mining, police
dc.subject.courseuuArtificial Intelligence


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record