Graduate Thesis Or Dissertation
 

Robust and Efficient Classification of Videos in the Wild

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/vm40xw219

Descriptions

Attribute NameValues
Creator
Abstract
  • Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four major parts. First, we address view-invariant action recognition. This problem is formulated within the multi-task learning framework, where the action model of each viewpoint is specified as a separate task and all tasks are trained jointly. Second, we address a large-scale action recognition in uncontrolled settings. For robustness, we augment the standard training video dataset with additional data from another modality data source -- namely, 3D skeleton sequences of human body motion --. A recurrent neural network called long short-term memory (LSTM) is used to encode sequences from 3D skeleton data. For learning another LSTM for video classification, we use a modified hybrid backpropagation through time algorithm. Third, we address the unsupervised video summarization. We formulate the problem as a subset frame selection and specified a novel deep generative network to compute a video summary with the smallest representation error. Fourth, we introduce the new problem of budget-aware semantic segmentation of videos. In this line of work, we consider two models. The first model uses a conditional random field (CRF) model and replaces the standard inference steps for feature computation with a sequential policy which intelligently selects a subset of regions and their corresponding features. The second model is a deep recurrent policy which is learned to select a subset of frames and uses a shallow convolutional neural network (CNN) to propagate the available segmentation to unlabeled frames. This research has advanced the state of the art in computer vision because the approaches developed enabled meeting stringent runtime requirements arising in many applications, and working in less sanitized settings.
License
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Non-Academic Affiliation
Subject
Rights Statement
Publisher
Peer Reviewed
Language
Replaces
Embargo date range
  • 2017-08-22 to 2018-02-27

Relationships

Parents:

This work has no parents.

In Collection:

Items