This dissertation delves into understanding, characterizing, and addressing dataset shift in deep learning, a pervasive issue for deployed machine learning systems. Integral aspects of the problem are examined: We start with the use of counterfactual explanations in order to characterize the behavior of deep reinforcement learning agents in visual input...
Many large-scale data analysis applications involve data that can vary over both time and space. Often the primary goal of analyzing spatiotemporal data is identifying trends, movements, and sudden changes with respect to time, location, or both. This can include a variety of applications in economics (housing prices, unemployment, job...
Compositional data is a type of data where the features are non-negative and always sum to a constant. This type of data is frequently encountered in many fields such as microbiology, geology, economics and natural language processing. Compositional data has unique statistical properties that makes it difficult to apply standard...
Although deep reinforcement learning agents have produced impressive results in many domains, their decision making is difficult to explain to humans. To address this problem, past work has mainly focused on explaining why an action was chosen in a given state. A different type of explanation that is useful is...
Although machine learning systems are often effective in real-world applications, there are situations in which they can be even better when provided with some degree of end user feedback. This is especially true when the machine learning system needs to customize itself to the end user's preferences, such as in...
Cold air pools are spatiotemporal phenomena that occur when cold air from higher elevations roll down the slope to accumulate in lower elevations. Behaviors like this lead to microclimate anomalies such as the city of Corvallis (Oregon) experiencing persistent cold weather even on a sunny day. We analyze multivariate temperature...
One of the tasks that continues to prove difficult in robotics is the ability to grasp objects of varying shapes. It is time-consuming to acquire large amounts of real-world data in order to train accurate classifiers that can predict the success or failure of a grasp. To solve this issue,...
Image classification is a difficult problem, often requiring large training sets to get satisfactory results. However this is a task that humans perform very well, and incorporating user feedback into these learning algorithms could help reduce the dependency on large amounts of labeled training data. This process has already been...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
Physical activity recognition using accelerometer data is a rapidly emerging field with many real-world applications. Much of the previous work in this area has assumed that the accelerometer data has already been segmented into pure activities, and the activity recognition task has been to classify these segments. In reality, activity...
The study of physical activity is important in improving people’s health as it can help people understand the relationship between physical activity and health. Accelerometers, due to its small size, low cost, convenience and its ability to provide objective information about the frequency, intensity, and duration of physical activity, has...
In text classification, labeling features is often less time consuming than labeling entire documents. In situations where very little labeled training data is available, feature relevance feedback has the potential to dramatically increase classification performance. We review previous work on incorporating feature relevance feedback in the form of labeled features...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning...
Motion capture data is a digital representation of the complex temporal structure of human motion. Motion capture is widely used for data-driven animation in sports,medicine and entertainment, because of its ability to capture complex and realistic
motions. Due to its efficiency and cost, methods for reusing collections of motion capture...
In standard training regimes, one assumes that the classes presented to a model constitute all of the classes that the model will encounter when it is deployed. In real deployment scenarios, however, a model can sometimes encounter situations or objects that it has never seen. When these scenarios are safety-critical,...
The thesis focuses on activity recognition from sensor data, which has spurred a great deal of interest due to its impact on health care and security. Previous work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort...
The computational identification of gene Transcription Start Sites (TSSs) can provide insights into the regulation and function of genes without performing expensive experiments, particularly in organisms with incomplete annotations. High-resolution general-purpose TSS prediction remains a challenging problem, with little recent progress on the identification and differentiation of TSSs which are...
Within the past several years the technology of high-throughput sequencing has transformed the study of biology by offering unprecedented access to life's fundamental building block, DNA. With this transformation's potential a host of brand-new challenges have emerged, many of which lend themselves to being solved through computational methods. From de...