Complaint Handling at the Dutch National Police
MetadataShow full item record
The Dutch National Police maintains an online interface that allows civilians to report their complaints regarding trade fraud over an online medium (e.g. eBay). Since an increasing amount of complaints are being filed, it is desirable to make an automatic distinction between complaints worth investigating and those not worth investigating. One valuable distinction which can be made early in the process is that between a complaint which will be withdrawn by either the complainant or the police and a complaint that will not be withdrawn. This thesis examines whether either one of nine machine learning classifiers trained on free text complaint data can be used for this purpose. Complicating this task is the class distribution in the data, where a majority of 86.7% is labelled as "not withdrawn". To prevent this skewness from affecting classifier performance, resampling, word weighting, and word normalization are applied, of which the influence on the classification performance is assessed. This research shows that using machine learning, it is possible to create such a distinction by classifying complaints on whether they will be withdrawn or not. Overall, it is found that probabilistic classifiers (i.e. naive Bayes) have the highest unimproved performance and that through data alterations the performance of an optimized machine learning technique (i.e. SVM) can be improved up to 13.5 percentage points. Furthermore, by optimizing the classifiers, the difference in performance between the best classifier (i.e. Logistic regression) and the worst classifier (i.e. K-nearest-neighbor) can be reduced from 11.8 to only 4.2 percentage points.