View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Efficiently and reliably evaluating text classification in data sampled via active learning

        Thumbnail
        View/Open
        Report.pdf (613.0Kb)
        Publication date
        2025
        Author
        Damme, Luuk van
        Metadata
        Show full item record
        Summary
        Text classification helps structure data such as medical documents but requires many labeled data examples, which are costly. Active learning reduces this cost by selecting only the most informative data to be labeled. This can lead to a biased assessment of a model due to the selection. The study explored to what extent active learning causes bias and whether this could be reduced by a technique called importance sampling. Findings show that importance sampling did reduce part of the bias but not entirely. More research is required before this method can be used in practice.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49017
        Collections
        • Theses
        Utrecht university logo