Using Active Learning to Mitigate Sampling Bias: a Comparison of Two Algorithms

Gathu, Hannah

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Krempl, G.M.
dc.contributor.author	Gathu, Hannah
dc.date.accessioned	2022-06-14T00:00:40Z
dc.date.available	2022-06-14T00:00:40Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/41632
dc.description.abstract	In machine learning we often have to work with imperfect data. One way in which data can be imperfect is through sampling bias. When a data sample is biased, it doesn't accurately represent the population. This makes it challenging to train a well-generalizing and unbiased classifier, which can cause a machine learning algorithm to be unfair. A lot of techniques have been produced to deal with this challenge. These vary from pre-processing techniques, which focus on mitigating bias in the data before beginning training, to in-process techniques, which alter the training process to reduce bias, to post-processing techniques. In this thesis we compare two active learning algorithms. One of them is an in-process technique that is specifically designed to target sampling bias. We compare this technique to a more conventional simpler active learning algorithm. Based on experiments on benchmark data, we compare how both algorithms perform in terms of improving the performance of the classifier, making the sample more closely resemble the “population” and other factors. We find that the conventional active learning algorithm actually seems to outperform the sampling bias targeted algorithm on most judging criteria and for most settings. Besides that, we also do an experiment on biased toxicology data from the RijksInstituut voor Volksgezondheid en Mileu(RIVM)
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	A lot of data used for machine learning has sampling bias. This makes it challenging to train a well-generalizing and unbiased classifier, which can cause a machine learning algorithm to be unfair. In this thesis we compare to techniques to see how well they perform in terms of mitigating sampling bias.
dc.title	Using Active Learning to Mitigate Sampling Bias: a Comparison of Two Algorithms
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	machine learning; sampling bias; artificial intelligence; fairness; active learning; supervised learning; data; toxicology; RIVM
dc.subject.courseuu	Computing Science
dc.thesis.id	4417

Files in this item

Name:: Thesis__complete___fixed_.pdf
Size:: 2.388Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record