|Abstract or Summary
- This thesis addresses a fundamental computer vision problem, that of action recognition. The goal of action recognition is to recognize a class of human actions in a given video. Action recognition has a wide range of applications, including automated surveillance, sports video analysis, internet-based searches etc. The main challenge is that actors move and change their pose while performing the action, and/or that the actors may be captured from different viewing angles. For example, certain viewpoints of a distinct action class may produce very similar appearance and motion features in the videos. This may cause confusion in recognizing the two action classes. Therefore, action recognition should be invariant to changes in camera viewpoints. While action recognition in videos captured from a fixed view has received significant interest in the past, multi-view action recognition is still an under-explored field. We have specified an exemplar based approach for multi-view action recognition. We propose a novel method for training the K - Nearest Neighbor (K-NN) classifier.Specifically, the K-NN is learned within a large-margin framework with a set of suitable constraints specified over camera viewpoints and action classes. Our constraints enforce that affinity between videos belonging to the same class should be larger than that of the videos belonging to distinct classes. Moreover, within the same action class, affinity between videos captured from the same camera viewpoint should be larger than that of distinct viewpoints. We efficiently compute the affinity between any two videos based on many-to-many matching of their supervoxels. The correspondences of the supervoxels of two videos are treated as latent random variables. Thus, we formulate a novel latent large-margin learning of the K-NN, subject to a set of viewpoint and class constraints. Given a new video, we use K-NN classifier to identify K closest training exemplars to the video, and transfer their majority action class label as the class label of the new video. Our approach outperforms the state of art on benchmark datasets: INRIA IXMAS, a newer version of IXMAS (NIXMAS) and i3DPost.