A relatively new model of error control is the limited magnitude error over high radix channels. In this error model, the error magnitude does not exceed a certain limit known beforehand. In this dissertation, we study systematic error control codes for common channels under the assumption that the maximum error...
This dissertation explores algorithms for learning ranking functions to efficiently solve search problems, with application to automated planning. Specifically, we consider the frameworks of beam search, greedy search, and randomized search, which all aim to maintain tractability at the cost of not guaranteeing completeness nor optimality. Our learning objective for...
Parallel processors are classified into two classes: shared-memory multiprocessors and distributed- memory multiprocessors. In the shared-memory system, processors communicate through a common memory unit. However, in the distributed multiprocessor system, each processor has its own memory unit and the communications among the processors are performed through an interconnection network. Thus,...
This dissertation addresses two fundamental problems in computer vision—namely,
multitarget tracking and event recognition in videos. These problems are challenging
because uncertainty may arise from a host of sources, including motion blur,
occlusions, and dynamic cluttered backgrounds. We show that these challenges can be
successfully addressed by using a multiscale,...
This dissertation addresses a number of inter-related and fundamental problems in computer vision. Specifically, we address object discovery, recognition, segmentation, and 3D pose estimation in images, as well as 3D scene reconstruction and scene interpretation. The key ideas behind our approaches include using shape as a basic object feature, and...
A distributed system is a network of multiple autonomous computational nodes designed primarily for performance scalability and robustness. The performance of a distributed system depends critically on how tasks and resources are distributed among the nodes. Thus, a main thrust in distributed system research is to design schemes for distributing...
Acting intelligently to efficiently solve sequential decision problems requires the ability to extract hierarchical structure from the underlying domain dynamics, exploit it for optimal or near-optimal decision-making, and transfer it to related problems instead of solving every problem in isolation. This dissertation makes three contributions toward this goal.
The first...
Within the past several years the technology of high-throughput sequencing has transformed the study of biology by offering unprecedented access to life's fundamental building block, DNA. With this transformation's potential a host of brand-new challenges have emerged, many of which lend themselves to being solved through computational methods. From de...
The meteoric rise and prevalent usage of wireless networking technologies for mobile
communication applications have captured the attention of media and imagination of
public in the recent decade. One such proliferation is experienced in Wireless Sensor
Networks (WSNs), where multimedia enabled elements are fused with integrated
sensors to empower tightly...
It is possible to purchase, for as little as $10,000, a cluster of computers with the capability to rival the supercomputers of only a few years ago. Now, users that have little to no experience developing distributed applications or managing a cluster are in a position to do so. To...
In this work, I examine the problem of understanding American football in video. In particular, I present several mid-level computer vision algorithms that each accomplish a different sub-task within a larger system for annotating, interpreting, and analyzing collections of American football video. The analysis of football video is useful in...
Networks of distributed, remote sensors are providing ecological scientists with a view of our environment that is unprecedented in detail. However, these networks are subject to harsh conditions, which lead to malfunctions in individual sensors and failures in network communications. This behavior manifests as corrupt or missing measurements in the...
Bayesian Optimization (BO) methods are often used to optimize an unknown function f(•) that is costly to evaluate. They typically work in an iterative manner. In each iteration, given a set of observation points, BO algorithms select k ≥ 1 points to be evaluated. The results of those points are...
How can an agent generalize its knowledge to new circumstances? To learn
effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented...
Partial programming is a field of study where users specify an outline or skeleton of a program, but leave various parts undefined. The undefined parts are then completed by an external mechanism to form a complete program. Adaptation-Based Programming (ABP) is a method of partial programming that utilizes techniques from...
End-user programmers face many barriers in programming. Research has seen many programming environments that attempted to lower or remove the barriers but despite these efforts, empirical studies continue to report barriers users face. To investigate this issue, we took a theory-informed approach. Using theories from design, creativity, and problem solving...
Professional software engineers have an arsenal of techniques such as unit testing and assertions to check their specifications, but these techniques require tools, motivation, experience and training that programmers without professional software engineering training may not have. As a result, professionals in other fields, such as scientific modelers, face greater...
In this thesis I present the choice calculus, a formal language for representing variation in software and other structured artifacts. The choice calculus is intended to support variation research in a way similar to the lambda calculus in programming language research. Specifically, it provides a simple formal basis for presenting,...
The study of the diversity of multivariate objects shares common characteristics across disciplines, including ecology and organizational management. Nevertheless, experts in these two disciplines have adopted somewhat separate diversity concepts and analysis techniques, limiting the ability of potentially sharing and cross comparing these concerns. Moreover, while complex diversity data may...
Protein-protein interactions underlie all biological processes and are a field of study that has wide implications throughout many other fields including medicine, genetics, biology, and ecology. Proteins are the building blocks and primary actors of life. They work together to accomplish virtually every task within a cell, including, metabolism, signal...
Realistic (ideally photorealistic) real-time rendering has remained an elusive goal in computer graphics. While photorealistic rendering has certainly been achieved at the expense of tremendous computational resources and corresponding rendering times; real-time rendering typically must accept a great number of compromises to achieve adequate performance, such as aliasing artifacts, the...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their...
Routing from a single source node to multiple destination nodes using node disjoint paths (NDP) has many important applications in parallel systems. For example, if a source node wants to send distinct messages to distinct destination nodes, then the one-to-many NDP routing is useful.
Unlike parallel systems with shared-memory, each...
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...
Tensegrity structures are composed of pure compressional elements that are connected via a network of pure tensional elements. The concept of tensegrity promises numerous advantages to the field of robotics. Tensegrity robots are, however, notoriously difficult to control due to their oscillatory nature and nonlinear interaction between the components. Multiagent...
In real networks, identifying dense regions is of great importance. For example, in a network that represents academic collaboration, authors within the densest component of the graph tend to be the most prolific. Dense subgraphs often identify communities in social networks. And dense subgraphs can be used to discover regulatory...
An age-wave is upon us where many older adults are reaching retirement. Technically experienced older adults have skills that could be directly applied to free/open source software (FOSS) communities, such as project management, programming, and/or knowledge of a rapidly growing end-user population. FOSS is a widely popular, low-cost way to...
End-user programmers often struggle to create programs that run quickly and effectively, which can be a major deterrent in completing their tasks as desired. Current research has primarily focused on catching user mistakes, such as errors or misused formulas. However, end users deal with issues other than just correctness. In...
We are witnessing the rise of the data-driven science paradigm, in which massive amounts of data - much of it collected as a side-effect of ordinary human activity - can be analyzed to make sense of the data and to make useful predictions. To fully realize the promise of this...
How can end users efficiently influence the predictions that machine learning systems make on their behalf? Traditional systems rely on users to provide examples of how they want the learning system to behave, but this is not always practical for the user, nor efficient for the learning system. This dissertation...
Many problems in ecology and conservation biology can be formulated and solved using machine learning algorithms for multi-label classification. This dissertation addresses three topics related to predicting the distributions of multiple species. It improves existing methods and proposes a new modeling paradigm to address the multi-species, multi-label problem. The first...
Many applications in computer graphics and geometry processing rely on the ability to
locally orient 2D and 3D entities on a surface, or inside a volume, such as the sinusoidal
kernels in Gabor noise, the color and geometry textures in pattern synthesis, and the
finite elements in remeshing. In these...
Writing a program that performs well in a complex environment is a challenging task. In such problems, a method of deterministic programming combined with reinforcement learning (RL) can be helpful. However, current systems either force developers to encode knowledge in very specific forms (e.g., state-action features), or assume advanced RL...
The study of variational typing originated from the problem of type inference for variational programs, which encode numerous different but related plain programs. In this dissertation, I present a sound and complete type inference algorithm for inferring types of all plain programs encoded in variational programs. The proposed algorithm runs...
Software testing is of critical importance for the success of software projects. Current inefficient testing methods often still take up half or more of a software project's budget. Automatic test data generation is the most promising way to lower the software testing cost. Manually creating testing data is expensive and...
The widespread use of wireless devices that we have recently been witnessing, such as smartphones, tablets, laptops, and wirelessly accessible devices in general, is causing an unprecedented growth in the required amount of the wireless radio spectrum. On the other hand, the spectrum resource has, for the last several decades,...
Quotient rings of Gaussian and Eisenstein-Jacobi(EJ) integers can be deployed to construct interconnection networks with good topological properties. In this thesis, we propose deadlock-free deterministic and partially adaptive routing algorithms for hexagonal networks, one special class of EJ networks. Then we discuss higher dimensional Gaussian networks as an alternative to...
Machine learning systems are generally trained offline using ground truth data that has been labeled by experts. However, these batch training methods are not a good fit for many applications, especially in the cases where complete ground truth data is not available for offline training. In addition, batch methods do...
Maintaining the sustainability of the earth’s ecosystems has attracted much attention as these ecosystems are facing more and more pressure from human activities. Machine learning can play an important role in promoting sustainability as a large amount of data is being collected from ecosystems. There are at least three important...
Empirical studies have shown that programmers spend up to one-third of their time navigating through code during debugging. Although researchers have conducted empirical studies to understand programmers’ navigation difficulties and developed tools to address those difficulties, the resulting findings tend to be loosely connected to each other. To address this...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
Software testing is a very important task during software development and it can be used to improve the quality and reliability of the software system. One potential way to reduce the cost and increase the efficiency of software testing is to generate test data automatically. Search-based approaches successfully generate unit...
This thesis focuses on the problem of object tracking. Given a video, the general objective of tracking is to track the location over time of one or more targets in the image sequence. This is a very challenging task as algorithms need to deal with problems such as appearance variations,...
Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research...
Monte Carlo tree search (MCTS) is a class of online planning algorithms for Markov decision processes (MDPs) and related models that has found success in challenging applications. In the online planning approach, the agent makes a decision in the current state by performing a limited forward search over possible futures...
A bad software development process leads to wasted effort and inferior products. In order to improve a software process, it must be first understood. In this work I focus on understanding software processes.
The first process we seek to understand is Continuous Integration (CI). CI systems automate the compilation, building,...
The proliferation of mobile users and internet content has advanced a plethora of research areas. Among these areas include mobile networks, transport layer protocols, and smart cities. Research shows that global mobile data traffic will increase sevenfold reaching 49 exabytes per month by 2021, most of which will be mobile...
The main goal of automated test generation is to improve the reliability of a program by exposing faults to developers. To this end, testing should cover the largest possible portion of the program given a test budget (i.e., time and resources) as frequently as possible. Coverage of a program entity...