Distance-based algorithms are machine learning algorithms that classify queries
by computing distances between these queries and a number of internally stored
exemplars. Exemplars that are closest to the query have the largest in
uence on
the classi cation assigned to the query. Two speci c distance-based algorithms, the
nearest neighbor...
We developed and investigated machine learning methods that require
minimal preprocessing of the input data, use few training examples, run fast, and
still obtain high levels of accuracy.
Most approaches to designing machine learning programs are based on the
supervised learning paradigm – training examples are chosen randomly and given...
Knowledge workers are struggling in the information flood. There is a growing interest in intelligent desktop environments that help knowledge workers organize their daily life. Intelligent desktop environments allow the desktop user to define a set of “activities” that characterize the user’s desktop work. These environments then attempt to identify...
Anomaly detection has been used in variety of applications in practice, including cyber-security, fraud detection and detecting faults in safety critical systems, etc. Anomaly detectors produce a ranked list of statistical anomalies, which are typically examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, most...
This dissertation explores the idea of applying machine learning technologies to help computer users find information and better organize electronic resources, by presenting the research work conducted in the following three applications: FolderPredictor, Stacking Recommendation Engines, and Integrating Learning and Reasoning.
FolderPredictor is an intelligent desktop software tool that helps...
Assembly planning is a crucial task for every manufacturing product. In general, assembly operations consume more than 30% of the total manufacturing time and cost. Therefore, any effort in optimizing assembly will have a significant impact on the economic success of manufacturing. Finding an optimal assembly plan by hand is...
The task of mapping spelled English words into strings of phonemes and stresses ("reading aloud") has many practical applications. Several commercial systems perform this task by applying a knowledge base of expert-supplied letter-to-sound
rules. This dissertation presents a set of machine learning methods for automatically constructing letter-to-sound rules by analyzing...
Many problems in ecology and conservation biology can be formulated and solved using machine learning algorithms for multi-label classification. This dissertation addresses three topics related to predicting the distributions of multiple species. It improves existing methods and proposes a new modeling paradigm to address the multi-species, multi-label problem. The first...
Tree patterns are natural candidates for representing rules and hypotheses in many tasks such as information extraction and symbolic mathematics. A tree pattern is a tree with labeled nodes where some of the leaves may be labeled with variables, whereas a tree instance has no variables. A tree pattern matches...
Machine learning systems are generally trained offline using ground truth data that has been labeled by experts. However, these batch training methods are not a good fit for many applications, especially in the cases where complete ground truth data is not available for offline training. In addition, batch methods do...
We are witnessing the rise of the data-driven science paradigm, in which massive amounts of data - much of it collected as a side-effect of ordinary human activity - can be analyzed to make sense of the data and to make useful predictions. To fully realize the promise of this...
Specialized or secondary metabolism is a collection of pathways and small molecules that, while beneficial to an organism, are not strictly necessary for survival. Plants use secondary metabolites to, among other things, attract pollinators, defend against biotic and abiotic stressors, and form symbioses. Natural products from plants have seen an...
Structural health monitoring (SHM) systems perform automated non-destructive damage detection and characterization for a variety of large structures including civil structures such as bridges and aerospace structures such as aircrafts and space vehicles. The goals of SHM include preventing catastrophic structural failures, increasing reliability, reducing maintenance costs, and increasing the...
In its simplest form, the process of diagnosis is a decision-making process in which the diagnostician performs a sequence of tests culminating in a diagnostic decision. For example, a physician might perform a series of simple measurements (body tem- perature, weight, etc.) and laboratory measurements (white blood count, CT scan,...
This dissertation explores algorithms for learning ranking functions to efficiently solve search problems, with application to automated planning. Specifically, we consider the frameworks of beam search, greedy search, and randomized search, which all aim to maintain tractability at the cost of not guaranteeing completeness nor optimality. Our learning objective for...
Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated...
Machine learning applied to computer architecture has rapidly transitioned from a theoretical novelty to being a driving force behind design, control, and simulation in practically all components. These machine-learning-based methodologies are further notable for their scalability to increasingly complex design challenges, which has allowed these methodologies to surpass the prior...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
Many important application problems in engineering can be formalized as nonlinear
optimization tasks. However, numerical methods for solving such problems
are brittle and do not scale well. For example, these methods depend critically
on choosing a good starting point from which to perform the optimization search.
In high-dimensional spaces, numerical...
Maintaining the sustainability of the earth’s ecosystems has attracted much attention as these ecosystems are facing more and more pressure from human activities. Machine learning can play an important role in promoting sustainability as a large amount of data is being collected from ecosystems. There are at least three important...
Many approaches for achieving intelligent behavior of automated (computer) systems involve components that learn from past experience. This dissertation studies computational methods for learning from examples, for classification and for decision
making, when the decisions have different non-zero costs associated with them. Many practical applications of learning algorithms, including transaction...
This thesis presents a progression of novel planning algorithms that culminates in a new family of diverse Monte-Carlo methods for probabilistic planning domains. We provide a proof for performance guarantees and analyze how these algorithms can resolve some of the shortcomings of traditional probabilistic planning methods. The direct policy search...
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their...
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
How can end users efficiently influence the predictions that machine learning systems make on their behalf? Traditional systems rely on users to provide examples of how they want the learning system to behave, but this is not always practical for the user, nor efficient for the learning system. This dissertation...
This dissertation addresses the problem of video labeling at both the frame and pixel levels using deep learning. For pixel-level video labeling, we have studied two problems: i) Spatiotemporal video segmentation and ii) Boundary detection and boundary flow estimation. For the problem of spatiotemporal video segmentation, we have developed recurrent...
Learning novel concepts from relational databases is an important problem with applications in several disciplines, such as data management, natural language processing, and bioinformatics. For a learning algorithm to be effective, the input data should be clean and in some desired representation. However, real-world data is usually heterogeneous – the...
Bayesian networks are used for building intelligent agents that act under uncertainty. They are a compact representation of agents' probabilistic knowledge. A Bayesian network can be viewed as representing a factorization of a full joint probability distribution into the multiplication of a set of conditional probability distributions. Independence of causal...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
The goal of many machine learning problems can be formalized as the creation of a function that can properly classify an input vector, given a set of examples of that function. While this formalism has produced a number of success stories, there are notable situations in which it fails. One...
Multiagent learning with cooperative coevolutionary algorithms is a critical area of research, and is relevant to many real-world applications including air traffic control, distributed sensor network control, and game-theoretic applications such as border patrol. A key difficulty in multiagent learning is the credit assignment problem, where the impact of each...
In supervised learning, label information can be provided at different levels of granularity. For small datasets, it is possible to acquire a label for each data instance. However, in the big-data regime, this fine granularity approach is prohibitively costly. For example, in semi-supervised learning, only a limited number of samples...