Decision Boundary Maps for Supporting User-Driven Pseudo-labeling
Summary
Training classifier models with semi-labeled datasets, which often have only a limited number of labeled samples, is challenging. This thesis proposes a user-centric methodology for pseudo-labeling semi-labeled data, fusing automatic pseudo-labeling algorithms with user-driven correction of mislabeled data points.
The methodology is supported by a number of visual analytics approaches involving sample visualization via dimensionality reduction techniques and visualization of classifier decision boundaries using so-called Decision Boundary Maps (DBMs). These visuals allow users to find regions of uncertainty where automatic pseudo-labeling may have made errors and correct these accordingly. To speed up the visual analytics loop, we propose various heuristics for efficient and accurate DBM computation. Conducted user experiments show that both domain expert and non-expert users were able to consistently correct wrong labels and improve classifier performance for different datasets and classifier models, with only a limited effort in a limited amount of time.
The study underscores the importance and potential of visualization tools in the context of semi-labeled datasets and semi-supervised learning and provides a foundation for future research in this area.