Exploiting Spatial Relations for Visual Cognitive Tasks
Summary
During visual search, humans rely on a combination of bottom-up and topdown cues to select the most promising places where to direct attention and gaze at. Bottom-up cues are often postulated to be saliency/conspicuity based, in the sense that visual locations that "pop-out" by some rare local feature or contrast con?gurations lead to a raised interest for visual inspection. Top-down cues originate from prior knowledge about scene and object properties, like, for example, when searching for a particular object. In this thesis, we want to inspect top-down cues for visual search that originate
from relational object knowledge and from knowing the identity of an object. This may occur in a way that the system has previously acquired conditional probabilities for, say, the chance that a certain object (e.g. a car) appearing in conjunction with another object (e.g. a park meter), together with information where the objects are located relative to each other. The target is then to build a system that uses an appearance-based object classi?er to identify single objects in a scene, and that proposes new locations and candidate objects that should be searched next by recurring to the relational knowledge about typical scenes, leading to a sensible visual search path that rapidly concentrates on the most frequently appearing objects in selected typical scenes (e.g. street scenes).
The work will involve contributions from a general-purpose vision system including methods for conspicuity-based extraction of salient locations as well
as visual object classi?cation.