Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorCruyff, Maarten
dc.contributor.authorNikolas Anova, Nikolas
dc.date.accessioned2023-07-25T00:02:16Z
dc.date.available2023-07-25T00:02:16Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/44310
dc.description.abstractHuman trafficking is a problem that still occurs in the modern world, and it is necessary to monitor the number of victims. Since human trafficking is a hidden crime, statistics on identified trafficking victims only reveal a small part of the problem, and the actual number of victims can only be estimated. UNODC recommends using Multiple Systems Estimation (MSE), whereby the size of a hidden population of human trafficking victims is estimated by analyzing the overlap between three or more administrative lists on which persons belonging to that population appear. In MSE implementation, one of the main problems is missing data. This problem is most likely to occur in the application of MSE due to the use of registration data from several different external sources. The application of the imputation method should be able to solve missing data problems. Since this problem frequently occurs in MSE implementations, however, based on literature reviews, a comparative study of the imputation method performance based on the MSE output has never been conducted. Case in the Netherlands, the missing data problem in human trafficking records also happened in 2016 – 2019. Nevertheless, in previous studies with the same data, multiple imputation was used only with the default method for binary and 2-level categorical data (i.e., logistic regression). The existence of missing data certainly has reduced the quality of population estimates. However, to produce the best MSE output, choosing the suitable imputation method must be done beforehand. Based on these problems, this study compared the imputation methods performance based on the MSE results in estimating the human trafficking population in the Netherlands from 2016 – 2019. The comparison is seen through the AIC and BIC value of the model. Then the comparison continues between the AIC and BIC version, which is compared based on model complexity, standard error, and reasonableness of estimation. This study focuses on using multiple imputation with seven different methods. These methods are predictive mean matching (PMM), classification and regression trees (CART), random forest, logistic regression, logistic regression with bootstrap, lasso logistic regression, and linear discriminant analysis (LDA). As a result, different imputation methods produced quite varied MSE model scores and population estimation. The CART method produced the best MSE model compared to other imputation methods. The imputed dataset by CART has the best AIC and BIC scores compared to other imputation methods. The logistic regression method used in previous research produced the rank 6th MSE model in both the AIC and BIC versions. On the other hand, random forest is the imputation method that had the worst MSE model compared to the others. These results show that if there is a problem of missing data in the application of MSE, the choice of the imputation method is proven to affect the quality of the output from MSE.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectCOMPARISON OF DATA IMPUTATION METHODS PERFORMANCE FOR MULTIPLE SYSTEM ESTIMATION (CASE STUDY: HUMAN TRAFFICKING DATA IN THE NETHERLANDS 2016 - 2019)
dc.titleCOMPARISON OF DATA IMPUTATION METHODS PERFORMANCE FOR MULTIPLE SYSTEM ESTIMATION (CASE STUDY: HUMAN TRAFFICKING DATA IN THE NETHERLANDS 2016 - 2019)
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsMultiple System Estimation; Human Trafficking, Missing Data; Multiple Imputation; performance comparison
dc.subject.courseuuApplied Data Science
dc.thesis.id20040


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record