View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Complaint Handling at the Dutch National Police

        Thumbnail
        View/Open
        Thesis.pdf (2.039Mb)
        Publication date
        2017
        Author
        Kos, W.H.
        Metadata
        Show full item record
        Summary
        The Dutch National Police maintains an online interface that allows civilians to report their complaints regarding trade fraud over an online medium (e.g. eBay). Since an increasing amount of complaints are being filed, it is desirable to make an automatic distinction between complaints worth investigating and those not worth investigating. One valuable distinction which can be made early in the process is that between a complaint which will be withdrawn by either the complainant or the police and a complaint that will not be withdrawn. This thesis examines whether either one of nine machine learning classifiers trained on free text complaint data can be used for this purpose. Complicating this task is the class distribution in the data, where a majority of 86.7% is labelled as "not withdrawn". To prevent this skewness from affecting classifier performance, resampling, word weighting, and word normalization are applied, of which the influence on the classification performance is assessed. This research shows that using machine learning, it is possible to create such a distinction by classifying complaints on whether they will be withdrawn or not. Overall, it is found that probabilistic classifiers (i.e. naive Bayes) have the highest unimproved performance and that through data alterations the performance of an optimized machine learning technique (i.e. SVM) can be improved up to 13.5 percentage points. Furthermore, by optimizing the classifiers, the difference in performance between the best classifier (i.e. Logistic regression) and the worst classifier (i.e. K-nearest-neighbor) can be reduced from 11.8 to only 4.2 percentage points.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/26230
        Collections
        • Theses
        Utrecht university logo