View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Text Classification of Dutch police records

        Thumbnail
        View/Open
        thesis_brandenburg.pdf (1019.Kb)
        Publication date
        2017
        Author
        Brandenburg, M.
        Metadata
        Show full item record
        Summary
        The large databases of government agencies are an interesting source for analyzing. Using government data, tax evaders might be found, and people could be called in for medical testing more or less often based on their data. This thesis project is an effort to use text mining for the classification of police reports. The goal was to train a model based on text mining for correct classification of three classes of online crime: online threat, online distribution of sexually obscene imagery, and computer trespass. To this end different approaches for building a model were compared. Preprocessing steps and model options included linguistic preprocessing, multiple methods of feature construction and selection, boosting and resampling. Four different algorithms were compared: Naive Bayes, Random Forest, SVM and XGBoost. The resulting models are promising, with F-score increases from random classification by factor 4-11.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/26997
        Collections
        • Theses
        Utrecht university logo