Many large-scale data analysis applications involve data that can vary over both time and space. Often the primary goal of analyzing spatiotemporal data is identifying trends, movements, and sudden changes with respect to time, location, or both. This can include a variety of applications in economics (housing prices, unemployment, job...
Image classification is a difficult problem, often requiring large training sets to get satisfactory results. However this is a task that humans perform very well, and incorporating user feedback into these learning algorithms could help reduce the dependency on large amounts of labeled training data. This process has already been...
Advances in sensor technology are greatly expanding the range of quantities that can be measured while simultaneously reducing the cost. However, deployed sensors drift out of calibration and fail, so every sensor network requires quality control procedures to promptly detect these failures. To address these problems, we propose a two-level...
Scientists and engineers have to analyze and query multiple large databases. Analysis over databases created by phasor measurement units can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data efficiently, which...
The study of physical activity is important in improving people’s health as it can help people understand the relationship between physical activity and health. Accelerometers, due to its small size, low cost, convenience and its ability to provide objective information about the frequency, intensity, and duration of physical activity, has...
This thesis addresses a basic problem in computer vision, that of semantic labeling of images. Our work is aimed at object detection in biological images for evolutionary biology research. In particular, our goal is to detect nematocysts in Scanning Electron Microscope (SEM) images. This biological domain presents challenges for existing...
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high
stochasticity or "outcome space" explosion. Multiagent domains are particularly susceptible to these problems. This thesis describes ways to mitigate these curses in several different multiagent domains, including real-time delivery of products...
Protein secondary structure prediction plays a pivotal role in predicting protein folding in three-dimensions. Its task is to assign each residue one of the three secondary structure classes helix, strand, or random coil. This is an instance of the problem of sequential supervised learning in machine learning. This thesis describes...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning...
Bayesian Optimization (BO) methods are often used to optimize an unknown function f(•) that is costly to evaluate. They typically work in an iterative manner. In each iteration, given a set of observation points, BO algorithms select k ≥ 1 points to be evaluated. The results of those points are...