Efficiently and reliably evaluating text classification in data sampled via active learning

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Leeuwenberg, A.M.
dc.contributor.author	Damme, Luuk van
dc.date.accessioned	2025-06-06T23:01:21Z
dc.date.available	2025-06-06T23:01:21Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/49017
dc.description.abstract	Text classification helps structure data such as medical documents but requires many labeled data examples, which are costly. Active learning reduces this cost by selecting only the most informative data to be labeled. This can lead to a biased assessment of a model due to the selection. The study explored to what extent active learning causes bias and whether this could be reduced by a technique called importance sampling. Findings show that importance sampling did reduce part of the bias but not entirely. More research is required before this method can be used in practice.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	This study explored 1) the degree to which internal evaluation via cross-validation in active learning data may be biased, and 2) whether importance sampling is effective in adjusting for the selection bias in the evaluation process of active learning.
dc.title	Efficiently and reliably evaluating text classification in data sampled via active learning
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Active learning, Importance sampling, Bias
dc.subject.courseuu	Bioinformatics and Biocomplexity
dc.thesis.id	46199