Decision Boundary Maps for Supporting User-Driven Pseudo-labeling

Grosu, Cristian

View/Open

Master_Thesis_2022_2023_final.pdf (40.03Mb)

Publication date

2024

Author

Grosu, Cristian

Metadata

Show full item record

Summary

Training classifier models with semi-labeled datasets, which often have only a limited number of labeled samples, is challenging. This thesis proposes a user-centric methodology for pseudo-labeling semi-labeled data, fusing automatic pseudo-labeling algorithms with user-driven correction of mislabeled data points. The methodology is supported by a number of visual analytics approaches involving sample visualization via dimensionality reduction techniques and visualization of classifier decision boundaries using so-called Decision Boundary Maps (DBMs). These visuals allow users to find regions of uncertainty where automatic pseudo-labeling may have made errors and correct these accordingly. To speed up the visual analytics loop, we propose various heuristics for efficient and accurate DBM computation. Conducted user experiments show that both domain expert and non-expert users were able to consistently correct wrong labels and improve classifier performance for different datasets and classifier models, with only a limited effort in a limited amount of time. The study underscores the importance and potential of visualization tools in the context of semi-labeled datasets and semi-supervised learning and provides a foundation for future research in this area.

URI

https://studenttheses.uu.nl/handle/20.500.12932/45791

Collections

Theses