Robust and Efficient Classification of Videos in the Wild

Mahasseni, Behrooz

Graduate Thesis Or Dissertation

Robust and Efficient Classification of Videos in the Wild

Public Deposited

Download PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/vm40xw219

Descriptions

Attribute Name	Values
Creator	Mahasseni, Behrooz
Abstract	Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four major parts. First, we address view-invariant action recognition. This problem is formulated within the multi-task learning framework, where the action model of each viewpoint is specified as a separate task and all tasks are trained jointly. Second, we address a large-scale action recognition in uncontrolled settings. For robustness, we augment the standard training video dataset with additional data from another modality data source -- namely, 3D skeleton sequences of human body motion --. A recurrent neural network called long short-term memory (LSTM) is used to encode sequences from 3D skeleton data. For learning another LSTM for video classification, we use a modified hybrid backpropagation through time algorithm. Third, we address the unsupervised video summarization. We formulate the problem as a subset frame selection and specified a novel deep generative network to compute a video summary with the smallest representation error. Fourth, we introduce the new problem of budget-aware semantic segmentation of videos. In this line of work, we consider two models. The first model uses a conditional random field (CRF) model and replaces the standard inference steps for feature computation with a sequential policy which intelligently selects a subset of regions and their corresponding features. The second model is a deep recurrent policy which is learned to select a subset of frames and uses a shallow convolutional neural network (CNN) to propagate the available segmentation to unlabeled frames. This research has advanced the state of the art in computer vision because the approaches developed enabled meeting stringent runtime requirements arising in many applications, and working in less sanitized settings.
License	All rights reserved
Resource Type	Dissertation
Date Issued	2016-11-30
Degree Level	Doctoral
Degree Name	Doctor of Philosophy (Ph.D.)
Degree Field	Computer Science
Degree Grantor	Oregon State University
Commencement Year	2017
Advisor	Todorovic, Sinisa
Committee Member	Zhang, Eugene Fern, Alan Li, Fuxin Tyler, Brett
Academic Affiliation	Electrical Engineering and Computer Science
Non-Academic Affiliation	Oregon State University. Graduate School
Subject	Video recording Human locomotion Human body in motion pictures Computer multitasking Video surveillance Machine learning Computer vision
Rights Statement	In Copyright
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]
Replaces	http://hdl.handle.net/1957/60021
Embargo date range	2017-08-22 to 2018-02-27

Relationships

Parents:

This work has no parents.

In Collection:

Graduate Theses and Dissertations (GTD)

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	MahasseniBehrooz2016.pdf	2017-08-22	Public	Download

ScholarsArchive@OSU

Robust and Efficient Classification of Videos in the Wild

Downloadable Content

Descriptions

Relationships

Items