Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorLeeuwenberg, A.M.
dc.contributor.authorDamme, Luuk van
dc.date.accessioned2025-06-06T23:01:21Z
dc.date.available2025-06-06T23:01:21Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/49017
dc.description.abstractText classification helps structure data such as medical documents but requires many labeled data examples, which are costly. Active learning reduces this cost by selecting only the most informative data to be labeled. This can lead to a biased assessment of a model due to the selection. The study explored to what extent active learning causes bias and whether this could be reduced by a technique called importance sampling. Findings show that importance sampling did reduce part of the bias but not entirely. More research is required before this method can be used in practice.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis study explored 1) the degree to which internal evaluation via cross-validation in active learning data may be biased, and 2) whether importance sampling is effective in adjusting for the selection bias in the evaluation process of active learning.
dc.titleEfficiently and reliably evaluating text classification in data sampled via active learning
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsActive learning, Importance sampling, Bias
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id46199


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record