dc.rights.license | CC-BY-NC-ND | |
dc.contributor.advisor | Telea, Alex | |
dc.contributor.author | Grosu, Cristian | |
dc.date.accessioned | 2024-01-09T00:01:18Z | |
dc.date.available | 2024-01-09T00:01:18Z | |
dc.date.issued | 2024 | |
dc.identifier.uri | https://studenttheses.uu.nl/handle/20.500.12932/45791 | |
dc.description.abstract | Training classifier models with semi-labeled datasets, which often have only a limited number of labeled samples, is challenging. This thesis proposes a user-centric methodology for pseudo-labeling semi-labeled data, fusing automatic pseudo-labeling algorithms with user-driven correction of mislabeled data points.
The methodology is supported by a number of visual analytics approaches involving sample visualization via dimensionality reduction techniques and visualization of classifier decision boundaries using so-called Decision Boundary Maps (DBMs). These visuals allow users to find regions of uncertainty where automatic pseudo-labeling may have made errors and correct these accordingly. To speed up the visual analytics loop, we propose various heuristics for efficient and accurate DBM computation. Conducted user experiments show that both domain expert and non-expert users were able to consistently correct wrong labels and improve classifier performance for different datasets and classifier models, with only a limited effort in a limited amount of time.
The study underscores the importance and potential of visualization tools in the context of semi-labeled datasets and semi-supervised learning and provides a foundation for future research in this area. | |
dc.description.sponsorship | Utrecht University | |
dc.language.iso | EN | |
dc.subject | Decision Boundary Mapping techniques for improving semi-supervised machine learning models.
The Ethics and Privacy Quick Scan of the Utrecht University Research Institute of Information and Computing Sciences classified this research as low-risk with no fuller ethics review or privacy assessment required. | |
dc.title | Decision Boundary Maps for Supporting User-Driven Pseudo-labeling | |
dc.type.content | Master Thesis | |
dc.rights.accessrights | Open Access | |
dc.subject.keywords | Semi-supervised learning; Pseudo-labeling; Dimensionality Reduction; Decision Boundary Maps; Data visualization | |
dc.subject.courseuu | Computing Science | |
dc.thesis.id | 26897 | |