View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        To Catch a Thief: fraud detection with reliable machine learning

        Thumbnail
        View/Open
        Master_s_thesis__to_catch_a_thief.pdf (1.503Mb)
        Publication date
        2019
        Author
        Wit, R.C. de
        Metadata
        Show full item record
        Summary
        The opaqueness of machine learning (ML) models is a major hurdle for adoption throughout the industry. Actors do not outright trust predictions generated by models. Instead they opt to use business rules-based systems in combination with manual classification, rather than ML models. Reliable machine learning (RML) seeks to mitigate this issue by providing a metric that establishes to which extent the model is certain about its predictions. We investigate the applicability of RML techniques for high-stakes contexts. Specifically, we conduct a case-study in fraud detection at bol.com, the Netherlands' largest online retail platform. Using hand-labelled data from 199,939 orders, we train models that classify those orders as either fraudulent or legitimate. These models are a Naive Bayes classifier (NB), a Credal Sum-Product Network (CSPN), and an XGBoost gradient booster (XGB). All three of these models provide probability as a reliability metrics for their predictions, and the NB and CSPN models provide robustness as a metrics as well. The analysis of robustness in a CSPN with a continous feature is a novelty and extends its application to many new domains. While the overall accuracy of the models does not exceed that of manual classification, we demonstrate how RML can improve upon existing business processes in four manners: 1) providing accurate predictions over a subset of the observations; 2) allowing for a flexible accuracy/cover trade-off; 3) inferring the latent difficulty variable for classifying individual observations; and 4) eliciting features and feature combinations that allow for increased business knowledge. This shows how RML can yield improvements upon existing business processes even when overall predictive accuracy of models is low, validating and building upon existing research in this domain.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/32878
        Collections
        • Theses
        Utrecht university logo