Currently, a popular approach to image classification uses the deep Transformer architecture. In a Transformer, the attention mechanism enables the model to learn efficiently with fewer computational resources than the convolutional neural networks (CNNs). In this thesis, we study the sparse attention mechanism widely used in the Transformers developed specifically...
This dissertation addresses object recognition in challenging settings, where distinct object classes are visually very similar (e.g., species of birds and insects) and/or access to training examples of object classes is limited (e.g., due to the associated high costs of data annotation). In this dissertation, we present a variety of...
This paper addresses the high model complexity and overconfident frame labeling of state-of-the-art (SOTA) action segmenters. Their complexity is typically justified by the need to sequentially refine action segmentation through multiple stages of a deep architecture. However, this multistage refinement does not take into account uncertainty of frame labeling predicted...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
This thesis considers the problem of training convolutional neural networks for online visual tracking. A major challenge for single object visual tracking is that most training sets with frame-level track annotations are quite small, due to the prohibitive cost of manual annotation. Current training approaches either supplement the annotations with...
Constructing a panorama from a set of videos is a long-standing problem in computer vision. A panorama represents an enhanced still-image representation of an entire scene captured in a set of videos, where each video shows only a part of the scene. Importantly, a panorama shows only the scene background,...