Semantically driven part-based object detection
Summary
Up to now deformable part based object models have initialized their parts by dividing the shape of their components into equally sized regions. The regions then become the parts that are anchored to a slightly moveable position on top of the components. What a part represents fully depends on what is represented in the region of the component that it was created from. However, the size and the location of the part has always been determined without any regard for the conceptual structure of the component.
We use the framework proposed by Felzenszwalb, et al (2010) to learn the shape of objects in ImageNet that are the meronyms of a more complex object from a category of interest provided by the Pascal VOC data set (2010). The meronyms are acquired by analyzing the semantic structure of the given category in WordNet.
We then use the shape of the meronyms to estimate their size and location in the learning data of the complex object. After calculating the principal components of the found part configurations we use a Fisher Linear Classifier to cluster the learning data.
Instead of only separating the data based on the shape of its bounding boxes, we create additional model components based on semantic structures. After learning the shape of the components of the model, we place the parts from ImageNet at their estimated positions on top of the semantically enhanced components.