Anomaly detection has been used in variety of applications in practice, including cyber-security, fraud detection and detecting faults in safety critical systems, etc. Anomaly detectors produce a ranked list of statistical anomalies, which are typically examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, most...
Structural health monitoring (SHM) systems perform automated non-destructive damage detection and characterization for a variety of large structures including civil structures such as bridges and aerospace structures such as aircrafts and space vehicles. The goals of SHM include preventing catastrophic structural failures, increasing reliability, reducing maintenance costs, and increasing the...
Learning novel concepts from relational databases is an important problem with applications in several disciplines, such as data management, natural language processing, and bioinformatics. For a learning algorithm to be effective, the input data should be clean and in some desired representation. However, real-world data is usually heterogeneous – the...
Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated...
This dissertation addresses the problem of video labeling at both the frame and pixel levels using deep learning. For pixel-level video labeling, we have studied two problems: i) Spatiotemporal video segmentation and ii) Boundary detection and boundary flow estimation. For the problem of spatiotemporal video segmentation, we have developed recurrent...
Base-pairing interaction are the fundamental blocks of an RNA structure. With increased ability to determine the structure of RNA molecules, an accurate and informative way of classifying these interactions would help to study RNA structure at many levels. Apart from the well-known complementary base pairs and wobble pairs, other interactions...
In supervised learning, label information can be provided at different levels of granularity. For small datasets, it is possible to acquire a label for each data instance. However, in the big-data regime, this fine granularity approach is prohibitively costly. For example, in semi-supervised learning, only a limited number of samples...
The Pacific Coast Groundfish Fishery harvests a diverse and large grouping of fishes, but it did not become heavily fished until around WWII. This makes the groundfish fishery a comparatively young fishery. Despite its youth, it is one of the largest and most lucrative fisheries in Oregon—with a current harvest...
Assembly planning is a crucial task for every manufacturing product. In general, assembly operations consume more than 30% of the total manufacturing time and cost. Therefore, any effort in optimizing assembly will have a significant impact on the economic success of manufacturing. Finding an optimal assembly plan by hand is...
Developing accurate predictive distribution models requires adequately representing relevant spatial and temporal scales, as these scales are ultimately reflective of the relationships between distributions and influential environmental conditions. In this research, we considered both spatial and temporal scale and the influence each has on predicting broad-scale distributions of two disparate...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
The ability to create reproducible cryptographically secure keys from temporal environments (e.g., images) has the potential to be a contributor to effective cryptographic mechanisms. Due to the noisy nature of these environments, achieving this goal in a user friendly fashion is a very challenging task, especially since there exists a...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
In bioacoustics, automatic animal voice detection and recognition from audio recordings is an emerging topic for animal preservation. Our research focuses on bird bioacoustics, where the goal is to segment bird syllables from the recording and predict the bird species for the syllables. Traditional methods for this task addresses the...
Maintaining the sustainability of the earth’s ecosystems has attracted much attention as these ecosystems are facing more and more pressure from human activities. Machine learning can play an important role in promoting sustainability as a large amount of data is being collected from ecosystems. There are at least three important...
Hop (Humulus lupulus L. var lupulus) is a plant of great cultural significance, used as a medicinal herb for thousands of years, and for flavor and as a preservative in brewing beer. Studies of the medicinal effects of the unique compounds produced by hop have led to interest from the...
Following thermal neutron induced fission, supervised machine learning and temporal gamma-ray spectroscopy methods were used to identify differences in the delayed gamma-ray spectra of Pu-239 and U-235. The temporal gamma-ray spectroscopy method takes advantage of the time-dependent decay of fission products. Employing Spearman's rank-order correlation coefficient and without prior knowledge...
Machine learning systems are generally trained offline using ground truth data that has been labeled by experts. However, these batch training methods are not a good fit for many applications, especially in the cases where complete ground truth data is not available for offline training. In addition, batch methods do...
Many problems in ecology and conservation biology can be formulated and solved using machine learning algorithms for multi-label classification. This dissertation addresses three topics related to predicting the distributions of multiple species. It improves existing methods and proposes a new modeling paradigm to address the multi-species, multi-label problem. The first...
How can end users efficiently influence the predictions that machine learning systems make on their behalf? Traditional systems rely on users to provide examples of how they want the learning system to behave, but this is not always practical for the user, nor efficient for the learning system. This dissertation...
This study investigates the use of predictive mapping techniques as well as geotechnical criteria in developing a multiregional soil liquefaction model and subsequent maps. The maps were produced using National Cooperative Soil Survey data, in the gSSURGO format, combined with soil liquefaction data gathered from studies, articles, and traditional seismic...
We are witnessing the rise of the data-driven science paradigm, in which massive amounts of data - much of it collected as a side-effect of ordinary human activity - can be analyzed to make sense of the data and to make useful predictions. To fully realize the promise of this...
Robots are being utilized in ever more complex tasks and environments to help humans with difficult or dangerous tasks. However, robotic grasping is still in its infancy and is one of the limiting factors which prevent the deployment of robots in the home and other assisted living scenarios. Traditional methods...
This thesis addresses a basic problem in computer vision, that of semantic labeling of images. Our work is aimed at object detection in biological images for evolutionary biology research. In particular, our goal is to detect nematocysts in Scanning Electron Microscope (SEM) images. This biological domain presents challenges for existing...
Multiagent learning with cooperative coevolutionary algorithms is a critical area of research, and is relevant to many real-world applications including air traffic control, distributed sensor network control, and game-theoretic applications such as border patrol. A key difficulty in multiagent learning is the credit assignment problem, where the impact of each...
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...
Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. This thesis studies the problem of learning...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their...
Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are...
Object categorization is one of the fundamental topics in computer vision research. Most current work in object categorization aims to discriminate among generic object classes with gross differences. However, many applications require much finer distinctions. This thesis focuses on the design, evaluation and analysis of learning algorithms for fine- grained...
The study of physical activity is important in improving people’s health as it can help people understand the relationship between physical activity and health. Accelerometers, due to its small size, low cost, convenience and its ability to provide objective information about the frequency, intensity, and duration of physical activity, has...
Coordination is essential to achieving good performance in cooperative multiagent systems. To date, most work has focused on either implicit or explicit coordination mechanisms, while relatively little work has focused on the benefits of combining these two approaches. In this work we demonstrate that combining explicit and implicit mechanisms can...
This dissertation explores algorithms for learning ranking functions to efficiently solve search problems, with application to automated planning. Specifically, we consider the frameworks of beam search, greedy search, and randomized search, which all aim to maintain tractability at the cost of not guaranteeing completeness nor optimality. Our learning objective for...
This study explores the use of predictive mapping techniques in developing Landtype Association (LTA) maps for use in natural resource management. These maps are produced for the USDA Forest Service on a regional basis at a 1:100,000 scale. The goal of this study is to develop and test a method...