Deep learning is becoming the latest trend in sensitive applications, such as healthcare, criminal justice, and finance. As these new applications emerge, adversaries are circumventing them.
Further, there have been concerns about the possibility of bias and discrimination in predictive applications.
In order to address these issues, we propose an...
Simultaneous speech translation (SimulST) is widely useful in many cross-lingual communication scenarios, including multinational conferences and international traveling. Since text-based simultaneous machine translation (SimulMT) has achieved great success in recent years. The conventional cascaded approach for SimulST uses a pipeline of streaming ASR followed by simultaneous MT but suffers from...
We consider the problem of finding unknown patterns that are recurring across multiple sets. For example, finding multiple objects that are present in multiple images or a short DNA code that is repeated across multiple DNA sequences. We first consider a simple problem of finding a single unknown pattern in...
Data can be represented in multiple views. Traditional multi-view learning methods (i.e., co-training, multi-task learning) focus on improving learning performance using information from the auxiliary view, although information from the target view is sufficient for learning task. However, this work addresses a semi-supervised case of multi-view learning, the surrogate supervision...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 The average log-likelihood on the holdout data for different values of K
in four states. The
Many problems in ecology and conservation biology can be formulated and solved using machine learning algorithms for multi-label classification. This dissertation addresses three topics related to predicting the distributions of multiple species. It improves existing methods and proposes a new modeling paradigm to address the multi-species, multi-label problem. The first...
Probabilistic models have been successfully applied for a wide variety of problems, such as but not limited to information retrieval, computer vision, bio-informatics and speech processing. Probabilistic models allow us to encode our assumptions about the data in an elegant fashion and enable us to perform machine learning tasks such...
There has been tremendous growth in using data analytic and machine learning algorithms to make critical decisions, such as in the national power grid, healthcare operations, and autonomous vehicles. Employing data analytic for decision-making allows cyber attackers to manipulate the decisions of these algorithms through data falsification. Hence, the trustworthiness...
Sequential supervised learning problems arise in many real applications. This dissertation focuses on two important research directions in sequential supervised learning: efficient training and feature induction.
In the direction of efficient training, we study the training of conditional random fields (CRFs), which provide a flexible and powerful model for sequential...
In supervised learning, label information can be provided at different levels of granularity. For small datasets, it is possible to acquire a label for each data instance. However, in the big-data regime, this fine granularity approach is prohibitively costly. For example, in semi-supervised learning, only a limited number of samples...