Many large-scale data analysis applications involve data that can vary over both time and space. Often the primary goal of analyzing spatiotemporal data is identifying trends, movements, and sudden changes with respect to time, location, or both. This can include a variety of applications in economics (housing prices, unemployment, job...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning...
Simultaneous speech translation (SimulST) is widely useful in many cross-lingual communication scenarios, including multinational conferences and international traveling. Since text-based simultaneous machine translation (SimulMT) has achieved great success in recent years. The conventional cascaded approach for SimulST uses a pipeline of streaming ASR followed by simultaneous MT but suffers from...
Machine Translation, the task of automatically translating between human languages has been studied for decades. This task is used to be solved by count-based statistical models, e.g. Phrase-based Statistical Machine Translation (PBSMT), which solves the translation problem by separately training a statistical language model and a translation model. Recently, Neural...
There has been tremendous growth in using data analytic and machine learning algorithms to make critical decisions, such as in the national power grid, healthcare operations, and autonomous vehicles. Employing data analytic for decision-making allows cyber attackers to manipulate the decisions of these algorithms through data falsification. Hence, the trustworthiness...
Compositional data is a type of data where the features are non-negative and always sum to a constant. This type of data is frequently encountered in many fields such as microbiology, geology, economics and natural language processing. Compositional data has unique statistical properties that makes it difficult to apply standard...
We describe a series of novel computational models, CERENKOV (Computational Elucidation of the REgulatory NonKOding Variome) and its successors CERENKOV2, CERENKOV3, and Convolutional CERENKOV3, for discriminating regulatory single nucleotide polymorphisms (rSNPs) from non-regulatory SNPs within non-coding genetic loci. The CERENKOV models are designed for recognizing rSNPs in the context of...
As robots are becoming more relevant to our lives, they are still having hard time accomplishing simple tasks such as picking and lifting. Problems that include environmental constraints, pose uncertainties and hardware noises restrain robots for grasping an object successfully from a perceivable environment. Many have looked into finding best...
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures based on base-pairing probabilities. Two typical examples of the latter group are the popular maximum expected accuracy (MEA) method...