How can end users efficiently influence the predictions that machine learning systems make on their behalf? Traditional systems rely on users to provide examples of how they want the learning system to behave, but this is not always practical for the user, nor efficient for the learning system. This dissertation...
The results of a machine learning from user behavior can be thought of as a program, and like all programs, it may need to be debugged. Providing ways for the user to debug it matters because without the ability to fix errors, users may find that the learned program’s errors...
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...
This thesis addresses a basic problem in computer vision, that of semantic labeling of images. Our work is aimed at object detection in biological images for evolutionary biology research. In particular, our goal is to detect nematocysts in Scanning Electron Microscope (SEM) images. This biological domain presents challenges for existing...
Learning easily understandable decision rules from examples is one of the classic problems in machine learning. Most learning algorithms for this problem employ some variation of a greedy separate-and-conquer algorithm. In this paper, we describe a system called LERILS that learns highly accurate and comprehensible rules from examples using a...
Specialized or secondary metabolism is a collection of pathways and small molecules that, while beneficial to an organism, are not strictly necessary for survival. Plants use secondary metabolites to, among other things, attract pollinators, defend against biotic and abiotic stressors, and form symbioses. Natural products from plants have seen an...
Many important application problems in engineering can be formalized as nonlinear
optimization tasks. However, numerical methods for solving such problems
are brittle and do not scale well. For example, these methods depend critically
on choosing a good starting point from which to perform the optimization search.
In high-dimensional spaces, numerical...
This study investigates the use of predictive mapping techniques as well as geotechnical criteria in developing a multiregional soil liquefaction model and subsequent maps. The maps were produced using National Cooperative Soil Survey data, in the gSSURGO format, combined with soil liquefaction data gathered from studies, articles, and traditional seismic...
Developing accurate predictive distribution models requires adequately representing relevant spatial and temporal scales, as these scales are ultimately reflective of the relationships between distributions and influential environmental conditions. In this research, we considered both spatial and temporal scale and the influence each has on predicting broad-scale distributions of two disparate...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
We are witnessing the rise of the data-driven science paradigm, in which massive amounts of data - much of it collected as a side-effect of ordinary human activity - can be analyzed to make sense of the data and to make useful predictions. To fully realize the promise of this...
Maintaining the sustainability of the earth’s ecosystems has attracted much attention as these ecosystems are facing more and more pressure from human activities. Machine learning can play an important role in promoting sustainability as a large amount of data is being collected from ecosystems. There are at least three important...
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their...
Machine learning systems are generally trained offline using ground truth data that has been labeled by experts. However, these batch training methods are not a good fit for many applications, especially in the cases where complete ground truth data is not available for offline training. In addition, batch methods do...
Object categorization is one of the fundamental topics in computer vision research. Most current work in object categorization aims to discriminate among generic object classes with gross differences. However, many applications require much finer distinctions. This thesis focuses on the design, evaluation and analysis of learning algorithms for fine- grained...
Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. This thesis studies the problem of learning...
This dissertation addresses the problem of video labeling at both the frame and pixel levels using deep learning. For pixel-level video labeling, we have studied two problems: i) Spatiotemporal video segmentation and ii) Boundary detection and boundary flow estimation. For the problem of spatiotemporal video segmentation, we have developed recurrent...
We developed and investigated machine learning methods that require
minimal preprocessing of the input data, use few training examples, run fast, and
still obtain high levels of accuracy.
Most approaches to designing machine learning programs are based on the
supervised learning paradigm – training examples are chosen randomly and given...
An important challenge in machine learning is to find ways of learning quickly from very small amounts of training data. The only way to learn from small data samples is to constrain the learning process by exploiting background knowledge. In this report, we present a theoretical analysis on the use...
Structural health monitoring (SHM) systems perform automated non-destructive damage detection and characterization for a variety of large structures including civil structures such as bridges and aerospace structures such as aircrafts and space vehicles. The goals of SHM include preventing catastrophic structural failures, increasing reliability, reducing maintenance costs, and increasing the...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
The goal of many machine learning problems can be formalized as the creation of a function that can properly classify an input vector, given a set of examples of that function. While this formalism has produced a number of success stories, there are notable situations in which it fails. One...
Knowledge workers are struggling in the information flood. There is a growing interest in intelligent desktop environments that help knowledge workers organize their daily life. Intelligent desktop environments allow the desktop user to define a set of “activities” that characterize the user’s desktop work. These environments then attempt to identify...
This study explores the use of predictive mapping techniques in developing Landtype Association (LTA) maps for use in natural resource management. These maps are produced for the USDA Forest Service on a regional basis at a 1:100,000 scale. The goal of this study is to develop and test a method...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning...
The ability to create reproducible cryptographically secure keys from temporal environments (e.g., images) has the potential to be a contributor to effective cryptographic mechanisms. Due to the noisy nature of these environments, achieving this goal in a user friendly fashion is a very challenging task, especially since there exists a...
The goal of Inductive Learning is to produce general rules from a set of seen examples, which can then be applied to other unseen examples. ID3 is an inductive learning algorithm that can be used for the classification task. The input to the algorithm is a set of tuples of...
Coordination is essential to achieving good performance in cooperative multiagent systems. To date, most work has focused on either implicit or explicit coordination mechanisms, while relatively little work has focused on the benefits of combining these two approaches. In this work we demonstrate that combining explicit and implicit mechanisms can...
Many approaches for achieving intelligent behavior of automated (computer) systems involve components that learn from past experience. This dissertation studies computational methods for learning from examples, for classification and for decision
making, when the decisions have different non-zero costs associated with them. Many practical applications of learning algorithms, including transaction...
This dissertation explores the idea of applying machine learning technologies to help computer users find information and better organize electronic resources, by presenting the research work conducted in the following three applications: FolderPredictor, Stacking Recommendation Engines, and Integrating Learning and Reasoning.
FolderPredictor is an intelligent desktop software tool that helps...
Anomaly detection has been used in variety of applications in practice, including cyber-security, fraud detection and detecting faults in safety critical systems, etc. Anomaly detectors produce a ranked list of statistical anomalies, which are typically examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, most...
In real-time control systems, the value of a control decision depends not
only on the correctness of the decision but also on the time when that decision
is available. Recent work in real-time decision making has used machine learning
techniques to automatically construct reactive controllers, that is, controllers
with little...
In its simplest form, the process of diagnosis is a decision-making process in which the diagnostician performs a sequence of tests culminating in a diagnostic decision. For example, a physician might perform a series of simple measurements (body tem- perature, weight, etc.) and laboratory measurements (white blood count, CT scan,...
Learning novel concepts from relational databases is an important problem with applications in several disciplines, such as data management, natural language processing, and bioinformatics. For a learning algorithm to be effective, the input data should be clean and in some desired representation. However, real-world data is usually heterogeneous – the...
The Pacific Coast Groundfish Fishery harvests a diverse and large grouping of fishes, but it did not become heavily fished until around WWII. This makes the groundfish fishery a comparatively young fishery. Despite its youth, it is one of the largest and most lucrative fisheries in Oregon—with a current harvest...
This thesis presents a progression of novel planning algorithms that culminates in a new family of diverse Monte-Carlo methods for probabilistic planning domains. We provide a proof for performance guarantees and analyze how these algorithms can resolve some of the shortcomings of traditional probabilistic planning methods. The direct policy search...
This dissertation explores algorithms for learning ranking functions to efficiently solve search problems, with application to automated planning. Specifically, we consider the frameworks of beam search, greedy search, and randomized search, which all aim to maintain tractability at the cost of not guaranteeing completeness nor optimality. Our learning objective for...