Graduate Thesis Or Dissertation


Recognizing human group activities in video through mining optimal features Public Deposited

Downloadable Content

Download PDF


Attribute NameValues
  • Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. To this and, we propose a novel, mid-level feature, called control point, for representing group activities. The control points are aimed at summarizing visual cues, lifting from the noisy low-level features, and jointly providing visual evidence of actors and their group activity to higher-level inference algorithms. We formulate a generative model, called chains model, to organize a huge number of video features in an ensemble of chains of control points, representing a group activity. The chains may have arbitrary length, ideally, starting and ending at the beginning and end of the time interval occupied by the activity. We derive an efficient MAP inference, which is a new, EM-like algorithm that iterates two steps: warps the chains of control points to their expected locations so they can better summarize visual cues, and then maximizes their posterior probability. Our evaluation on benchmark UT-Human Interaction and Collective Activities datasets demonstrates that we outperform the sate of the art with reasonable running times.
Resource Type
Date Available
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Committee Member
Academic Affiliation
Non-Academic Affiliation
Rights Statement
Peer Reviewed



This work has no parents.

In Collection: