Self-Supervised Partial Cycle-Consistency for Multi-View Matching
Summary
Matching objects across partially overlapping camera views is crucial in multi-camera systems and requires a view-invariant feature extraction network. Training such a network with cycle-consistency circumvents the need for labor-intensive labeling. In this paper, we extend the mathematical formulation of cycle-consistency to handle partial overlap. We then derive several cycle variants and introduce a pseudo-mask which directs the training loss to take partial cycle-consistency into account, consequently improving the self-supervised learning signal from cycle-consistency. We additionally present a time-divergent scene sampling scheme that improves the data input for self-supervised settings. Cross-camera matching experiments on the challenging DIVOTrack dataset show the merits of our approach. Compared to the self-supervised state-of-the-art, we achieve a 4.3 percentage point higher F1 score with our combined contributions. Our improvements are robust to reduced overlap in the training data, with substantial improvements in challenging scenes characterized by few matches between many people. Self-supervised feature networks trained with our method are effective at matching objects in a range of multi-camera settings, providing opportunities for complex tasks like large-scale multi-camera scene understanding.