Intuitively, it seems as though natural language processing tasks might benefit from explicit representations of the syntactic and semantic properties of text. Ontonotes is a dataset which attempts to annotate texts, to represent as much as possible of the meaning of the text explicitly within the annotation. Many tools exist...
Commercial and public safety usage of Unmanned Aerial Vehicles in the National Airspace is currently restricted by federal regulation. The Federal Aeronautics Association is interested in modifying the restrictions; however, research is needed to study the human factor and required aptitude for a single human operating multiple UAVs. This Master’s...
In-hand manipulations consist of dexterous motions that come easy to humans but still pose a challenge to robotic systems. It is difficult to control finger motions in long complicated sequences due to high DOFs and intricate contact interactions. For such complex motions, in-hand manipulations have generally been broken into a...
We take for granted how quickly we, as humans, form mental models of the world around us. By the time we are toddlers, we have an observable intuition around the physical rules of the world. Stacking blocks such that they don’t fall over becomes such a trivial task, that it...
Severe weather in the United States causes huge insured losses to crop and property frequently.It creates major impact and elicit diverse response in the weather insurance industry. Events like hail, storm, hurricane etc. are more likely to cause catastrophe losses. So it becomes crucial to collect and analyze these extreme...
We consider the problem of finding unknown patterns that are recurring across multiple sets. For example, finding multiple objects that are present in multiple images or a short DNA code that is repeated across multiple DNA sequences. We first consider a simple problem of finding a single unknown pattern in...
Novelty detection plays an important role in machine learning and signal processing. This
project studies novelty detection in a new setting where the data object is represented as
a bag of instances and associated with multiple class labels, referred to as multi-instance
multi-label (MIML) learning. Contrary to the common assumption...
Object recognition is a fundamental problem in computer vision. Recognition is
required by many applications. This thesis presents a distance based approach to
recognize objects. We are interested in objects that belong to very similar classes,
where each class has large variations. This problem is called fine-grained object
recognition. Given...
Monte-Carlo planning algorithms such as UCT make decisions at each step by
intelligently expanding a single search tree given the available time and then
selecting the best root action. Recent work has provided evidence that it can be
advantageous to instead construct an ensemble of search trees and make a...
Gusset plates are an important component of bridges. They are thick sheets of steel that join steel members together using fasteners and also strengthen their joint. Transportation agencies regularly evaluate and rate their inventories of gusset plate connections using visual inspection, which is very costly. To address this issue, we...
Many large-scale data analysis applications involve data that can vary over both time and space. Often the primary goal of analyzing spatiotemporal data is identifying trends, movements, and sudden changes with respect to time, location, or both. This can include a variety of applications in economics (housing prices, unemployment, job...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning...
Simultaneous speech translation (SimulST) is widely useful in many cross-lingual communication scenarios, including multinational conferences and international traveling. Since text-based simultaneous machine translation (SimulMT) has achieved great success in recent years. The conventional cascaded approach for SimulST uses a pipeline of streaming ASR followed by simultaneous MT but suffers from...
Machine Translation, the task of automatically translating between human languages has been studied for decades. This task is used to be solved by count-based statistical models, e.g. Phrase-based Statistical Machine Translation (PBSMT), which solves the translation problem by separately training a statistical language model and a translation model. Recently, Neural...
There has been tremendous growth in using data analytic and machine learning algorithms to make critical decisions, such as in the national power grid, healthcare operations, and autonomous vehicles. Employing data analytic for decision-making allows cyber attackers to manipulate the decisions of these algorithms through data falsification. Hence, the trustworthiness...
Compositional data is a type of data where the features are non-negative and always sum to a constant. This type of data is frequently encountered in many fields such as microbiology, geology, economics and natural language processing. Compositional data has unique statistical properties that makes it difficult to apply standard...
We describe a series of novel computational models, CERENKOV (Computational Elucidation of the REgulatory NonKOding Variome) and its successors CERENKOV2, CERENKOV3, and Convolutional CERENKOV3, for discriminating regulatory single nucleotide polymorphisms (rSNPs) from non-regulatory SNPs within non-coding genetic loci. The CERENKOV models are designed for recognizing rSNPs in the context of...
As robots are becoming more relevant to our lives, they are still having hard time accomplishing simple tasks such as picking and lifting. Problems that include environmental constraints, pose uncertainties and hardware noises restrain robots for grasping an object successfully from a perceivable environment. Many have looked into finding best...
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures based on base-pairing probabilities. Two typical examples of the latter group are the popular maximum expected accuracy (MEA) method...