|Abstract or Summary
- This dissertation addresses a number of inter-related and fundamental problems in computer vision. Specifically, we address object discovery, recognition, segmentation, and 3D pose estimation in images, as well as 3D scene reconstruction and scene interpretation. The key ideas behind our approaches include using shape as a basic object feature, and using structured prediction modeling paradigms for representing objects and scenes.
In this work, we make a number of new contributions both in computer vision and machine learning. We address the vision problems of shape matching, shape-based mining of objects in arbitrary image collections, context-aware object recognition, monocular estimation of 3D object poses, and monocular 3D scene reconstruction using shape from texture. Our work on shape-based object discovery is the first to show that meaningful objects can be extracted from a collection of arbitrary images, without any human supervision, by shape matching. We also show that a spatial repetition of objects in images (e.g., windows on a building facade, or cars lined up along a street) can be used for 3D scene reconstruction from a single image. The aforementioned topics have never been addressed in the literature.
The dissertation also presents new algorithms and object representations for the aforementioned vision problems. We fuse two traditionally different modeling paradigms Conditional Random Fields (CRF) and Random Forests (RF) into a unified framework, referred to as (RF)^2. We also derive theoretical error bounds of estimating distribution ratios by a two-class RF, which is then used to derive the theoretical performance bounds of a two-class
Thorough experimental evaluation of individual aspects of all our approaches is presented. In general, the experiments demonstrate that we outperform the state of the art on the benchmark datasets, without increasing complexity and supervision in training.