Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKrempl, G.M.
dc.contributor.authorGathu, Hannah
dc.date.accessioned2022-06-14T00:00:40Z
dc.date.available2022-06-14T00:00:40Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/41632
dc.description.abstractIn machine learning we often have to work with imperfect data. One way in which data can be imperfect is through sampling bias. When a data sample is biased, it doesn't accurately represent the population. This makes it challenging to train a well-generalizing and unbiased classifier, which can cause a machine learning algorithm to be unfair. A lot of techniques have been produced to deal with this challenge. These vary from pre-processing techniques, which focus on mitigating bias in the data before beginning training, to in-process techniques, which alter the training process to reduce bias, to post-processing techniques. In this thesis we compare two active learning algorithms. One of them is an in-process technique that is specifically designed to target sampling bias. We compare this technique to a more conventional simpler active learning algorithm. Based on experiments on benchmark data, we compare how both algorithms perform in terms of improving the performance of the classifier, making the sample more closely resemble the “population” and other factors. We find that the conventional active learning algorithm actually seems to outperform the sampling bias targeted algorithm on most judging criteria and for most settings. Besides that, we also do an experiment on biased toxicology data from the RijksInstituut voor Volksgezondheid en Mileu(RIVM)
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectA lot of data used for machine learning has sampling bias. This makes it challenging to train a well-generalizing and unbiased classifier, which can cause a machine learning algorithm to be unfair. In this thesis we compare to techniques to see how well they perform in terms of mitigating sampling bias.
dc.titleUsing Active Learning to Mitigate Sampling Bias: a Comparison of Two Algorithms
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsmachine learning; sampling bias; artificial intelligence; fairness; active learning; supervised learning; data; toxicology; RIVM
dc.subject.courseuuComputing Science
dc.thesis.id4417


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record