3D object recognition is a very difficult and important problem in computer vision, arising in a wide range of applications. Typically in 3D object recognition, interest points are extracted from images and then matched. A shortcoming of this approach is that points only carry local visual information. Therefore, there
could...
This thesis addresses a fundamental computer vision problem, that of action recognition. The goal of action recognition is to recognize a class of human actions in a given video. Action recognition has a wide range of applications, including automated surveillance, sports video analysis, internet-based searches etc. The main challenge is...
This thesis presents an interactive software tool for tracking a moving object in a video. In particular, we focus on the problem of tracking a player in American football videos. Object tracking is one of the fundamental problems in computer vision. It is one of the most important components in...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
This dissertation addresses the problem of semantic labeling of image pixels. In the course of our work, we considered different types of semantic labels, including object classes (e.g., car, person), 3D depth values (in the range 0 to 80 meters), and affordance classes (e.g., walkable, sittable). Semantic pixel labeling is...
This dissertation addresses the problem of recognizing human activities in videos. Our focus is on activities with stochastic structure, where the activities are characterized by variable space-time arrangements of actions, and conducted by a variable number of actors. These activities occur frequently in sports and surveillance videos. They may appear...
In this dissertation, we address action segmentation in videos under limited supervision. The goal of action segmentation is to predict an action class for each frame of a video. The limited supervision means ground truth labels of video frames are not available in training. We focus on three types of...
This dissertation addresses object recognition in challenging settings, where distinct object classes are visually very similar (e.g., species of birds and insects) and/or access to training examples of object classes is limited (e.g., due to the associated high costs of data annotation). In this dissertation, we present a variety of...