Protein secondary structure prediction plays a pivotal role in predicting protein folding in three-dimensions. Its task is to assign each residue one of the three secondary structure classes helix, strand, or random coil. This is an instance of the problem of sequential supervised learning in machine learning. This thesis describes...
The thesis focuses on model-based approximation methods for reinforcement
learning with large scale applications such as combinatorial optimization problems.
First, the thesis proposes two new model-based methods to stablize the
value–function approximation for reinforcement learning. The first one is the
BFBP algorithm, a batch-like reinforcement learning process which iterates between...
Software maintenance accounts for a large portion of the software development cost, particularly the process of updating programs either to adapt for requirement change or to enhance design or efficiency. Currently, program updates are generally performed manually by programmers using text editors. This is an unreliable
method because syntax and...
Accessing information on the Web has become ingrained into our daily lives, and we seek information from many different sources, including conference and journal publications, personal web pages, and others. Increasingly, web-based information retrieval systems such as web-based search engines, library on-line catalog systems, and subscription-based federated search systems are...
Domain-independent automated planning is concerned with computing a sequence of actions that can transform an initial state into a desired goal state. Resource production domains form an interesting class of such problems, in that they typically require reasoning about concurrent durative-actions with continuous effects while minimizing some cost function. Although...
Image feature detection and matching are two critical processes for many computer vision tasks. Currently, intensity-based local interest region detectors and local feature-based matching methods are used widely in computer vision applications. But in some applications, such as biological object recognition tasks, within-class changes in pose, lighting, color, and texture...
Current ACI design provisions for shear in reinforced concrete (RC) beams strengthened
with externally bonded carbon fiber-reinforced polymer (CFRP) U-wraps may not adequately account for the effect of scale. This paper describes tests of 6 geometrically scaled beams to identify possible scale effects of such beams. Results indicated that effective...
This thesis addresses the problem of learning dynamic Bayesian network (DBN) models to support reinforcement learning. It focuses on learning regression tree models of the conditional probability distributions of the DBNs. Existing algorithms presume that the stochasticity in the domain can be modeled as a deterministic function with additive noise....
The purpose of the present investigation was to develop an adaptive teaching model for an interactive computer-assisted instruction (CAT) program and to evaluate the effectiveness of the implementation of individualized examples or individualized
examples in personalized contexts. Translating word problems into equations provided the context for the CAT investigation. Four...
Probabilistic inference using Bayesian networks is now a well-established approach for reasoning under uncertainty. Among many e ciency-driven tech- niques which have been developed, the Optimal Factoring Problem (OFP) is distinguished for presenting a combinatorial optimization point of view on the problem. The contribution of this thesis is to extend...
End users develop more software than any other group of programmers, using software authoring devices such as e-mail filtering editors, by-demonstration macro builders, and spreadsheet environments. Despite this, there has been only a little research on finding ways to help these programmers with the dependability of the software they create....
We consider the problem of tactical assault planning in real-time strategy games where a team of friendly agents must launch an assault on an enemy. This problem offers many challenges including a highly dynamic and uncertain environment, multiple agents, durative actions, numeric attributes, and different optimization objectives. While the dynamics...
Automated recognition of object categories in images is a critical step for many real-world computer vision applications. Interest region detectors and region descriptors have been widely employed to tackle the variability of objects in pose, scale, lighting, texture, color, and so on. Different types of object recognition problems usually require...
Knowledge workers are struggling in the information flood. There is a growing interest in intelligent desktop environments that help knowledge workers organize their daily life. Intelligent desktop environments allow the desktop user to define a set of “activities” that characterize the user’s desktop work. These environments then attempt to identify...
Recently, the concept of performance based design has become popular for many types of structures, including port facilities under seismic loading. For the case of pile-supported wharves, the level of performance is generally estimated using the displacement capacity of the structure. Therefore, understanding soil-pile interaction is one of the most...
Ocean wave energy is a new and developing field of renewable energy with great potential. The energy contained in one meter of an average wave off the coast of Newport Oregon could supply dozens of homes with electricity. However, ocean waves are usually quite irregular which leads to large bursts...
The problem of document classification has been widely studied in machine learning and data mining. In document classification, most of the popular algorithms are based on the bag-of-words representation. Due to the high dimensionality of the bag-of-words representation, significant research has been conducted to reduce the dimensionality via different approaches....
This dissertation explores the idea of applying machine learning technologies to help computer users find information and better organize electronic resources, by presenting the research work conducted in the following three applications: FolderPredictor, Stacking Recommendation Engines, and Integrating Learning and Reasoning.
FolderPredictor is an intelligent desktop software tool that helps...
With edge rates of high speed digital devices pushing into the sub-nano second
range, interconnections with the associated packages play a major role in determining
the speed, size and performance of digital circuits and systems. The purpose of this
study is to develop experimental techniques based on time domain peeling...
In wood-based composites, the glue-line (interface) between wood-strands affects the stress transfer from one member to the next. The glue-line properties determine the rate of load transfer between phases and these properties depend on wood species, surface preparation, glue properties, glue penetration into wood cells, and moisture content of the...
Monte-Carlo planning algorithms such as UCT make decisions at each step by
intelligently expanding a single search tree given the available time and then
selecting the best root action. Recent work has provided evidence that it can be
advantageous to instead construct an ensemble of search trees and make a...
Customer requirements and engineering specifications will influence the direction of the design in any product, so it is crucial to make the requirements as stable as possible so quality designs can be produced on time within a budget. It would be optimal for requirements and specifications to remain constant but...
This dissertation explores algorithms for learning ranking functions to efficiently solve search problems, with application to automated planning. Specifically, we consider the frameworks of beam search, greedy search, and randomized search, which all aim to maintain tractability at the cost of not guaranteeing completeness nor optimality. Our learning objective for...
A thesis presented on the determination of factors related to deposition, clearance, and dosimetry of the ICRP 30 lung model as related to smoking. Variations in calculations used to determine radiation doses leave a need to determine a standard method to determine absorbed dose from smoking. Standard parameters are used...
Since free riders in P2P network reduce the system's performance, how to maintain and encourage the nodes' cooperation is an important aspect of P2P related research. In this thesis, a P2P system is modeled based on two games: stag hunt game and snowdrift game. To relate the model to the...
The first edition of the Highway Safety Manual (HSM) provides a quantitative
approach to predict the safety of transportation facilities based on the recently
developed scientific methods. This approach, known as the predictive method, was
developed for several states in the United States. Due to differences in driver
population, weather...
Environmental and regulatory concerns are causing confined animal feeding operations (CAFO's) to account for phosphorous content when applying wastewater to agricultural fields for disposal. In most cases this requires more land to spread the waste onto so that the phosphorous needs of the crop are not exceeded. A mobile process...
Microchannel process technology (MPT) offers several advantages to the field of nanomanufacturing: 1) improved process control over very short time intervals owing to shorter diffusional distances; and 2) reduced reactor size due to high surface area to volume ratios and enhanced heat and mass transfer. The objective of this thesis...
The appropriate separation of concerns is a fundamental engineering principle. A concern, for software developers, is that which must be represented by code in a program; by extension, separation of concerns is the ability to represent a single concern in a single appropriate programming language construct. Advanced separation of concerns...
Voltage converters or charge pumps find their use in many circuits. They are extensively used in hand held devices as cell phones, pagers, PDA's and laptops. Some of the important issues relating to design of voltage regulators for handheld devices are size, efficiency and noise. Another important factor to be...
Image segmentation continues to be a fundamental problem in computer vision and image understanding. In this thesis, we present a Bayesian network that we use for object boundary detection in which the MPE (most probable explanation) before any evidence can produce multiple non-overlapping, non-self-intersecting closed contours and the MPE with...
We consider the problem of strategic adversarial planning in a Real-Time Strategy (RTS) game. Strategic adversarial planning is the generation of a network of high-level tasks to satisfy goals while anticipating an adversary's actions. In this thesis we describe an abstract state and action space used for planning in an...
In this work, I examine the problem of understanding American football in video. In particular, I present several mid-level computer vision algorithms that each accomplish a different sub-task within a larger system for annotating, interpreting, and analyzing collections of American football video. The analysis of football video is useful in...
Regression testing is an expensive software engineering activity intended to provide confidence that modifications to a software system have not introduced faults. Test case prioritization techniques help to reduce regression testing cost by ordering test cases in a way that better achieves testing objectives. In this thesis, we are interested...
Recent work has shown that AdaBoost can be viewed as an algorithm that maximizes the margin on the training data via functional gradient descent. Under this interpretation, the weight computed by AdaBoost, for each hypothesis generated, can be viewed as a step size parameter in a gradient descent search. Friedman...
Regression testing is an expensive testing process used to validate changes made to previously tested software. Different regression testing techniques can have different impacts on the cost-effectiveness of testing. This cost-effectiveness can also vary with different characteristics of test suites. One such characteristic, test suite granularity, reflects the way in...
We have developed a web-based software tool for evaluating the potential risk of pesticides on the surrounding environment. The software tool is called Web-based Pesticide Screening Tool (Web-PST). It uses the formulas and standards specified by the Soil/Pesticide Interaction Screening Procedure Version II (SPISP II). Web-PST closely models the stand-alone...
Revised Universal Soil Loss Equation (RUSLE) is a standard for estimating soil loss caused by rainfall and overland flow. The current software tool that implements this standard is RUSLE 1.06b, which is a stand-alone DOS application. In this project, we converted the DOS application to a Web application, which we...
"Collaborative filtering algorithms’ performances have been evaluated using a variety of metrics.
These metrics, such as Mean Absolute Error and Precision, have often ignored recommendations for
which they do not have data. Ignoring these recommendations has provided numbers which do not
accurately represent the user experience. Qualitatively we have seen...
Remote sensing is the most practical way to acquire large amounts of land cover data for monitoring and understanding environmental change, so it is important to be able to map land cover from imagery. Maps defining land cover patches as polygons rather than pixels greatly improve processing efficiency in models...
In spite of wide spread adoption of the internet technology in teaching and learning, little research exists on the use and the effect of web incorporation in teaching and learning. Immediate clarification of how, when, and for whom the web integration in education is beneficial is needed. This qualitative study...
Bayesian Optimization (BO) methods are often used to optimize an unknown function f(•) that is costly to evaluate. They typically work in an iterative manner. In each iteration, given a set of observation points, BO algorithms select k ≥ 1 points to be evaluated. The results of those points are...
Microstructure changes in uranium and uranium/metal alloys due to radiation damage are of great interest in nuclear science and engineering. Titanium has attracted attention because of its similarity to Zr. It has been proposed for use in the second generation of fusion reactors due to its resistance to radiation-induced swelling....
Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a relatively larger diameter, but better cost/performance tradeoffs, such as higher channel bandwidth, and lower node degree, when compared to the hypercube. Thus, the torus is...
In this work, we first introduce a novel approach to the long term irrigation scheduling
using Genetic Algorithms (GAs). We explore the effectiveness of GAs in the context of
optimizing nonlinear crop models and describe application requirements and implementation of
the technique. GAs were found to converge quickly to near-optimal...
A public key cryptosystem allows two or more parties to securely communicate
over an insecure channel without establishing a physically secure channel for key
exchange. The RSA cryptosystem is the most popular public key cryptosystem ever
invented. It is based on the difficulty of factoring large composite numbers. Once the...
In this thesis, A nonlinear methodology for the control of the highly
maneuverable, high performance aircraft HARV (F-18) is studied by using sliding-mode
control (SMC). This control law, which takes a continuous function when the
input constraints are not considered, satisfies the reachability condition by which
concerned states are driven...
Reasoning about any realistic domain always involves a degree of uncertainty.
Probabilistic inference in belief networks is one effective way of reasoning under
uncertainty. Efficiency is critical in applying this technique, and many researchers
have been working on this topic. This thesis is the report of our research in this...
Knowledge compilation improves search-intensive problem-solvers that are easily specified but inefficient. One promising approach improves efficiency by constructing a database of problem-instance/best-action pairs that replace problem-solving search with efficient lookup. The database is constructed by reverse enumeration- expanding the complete search space backwards, from the terminal problem instances. This approach...
Many important application problems in engineering can be formalized as nonlinear
optimization tasks. However, numerical methods for solving such problems
are brittle and do not scale well. For example, these methods depend critically
on choosing a good starting point from which to perform the optimization search.
In high-dimensional spaces, numerical...
In real-time control systems, the value of a control decision depends not
only on the correctness of the decision but also on the time when that decision
is available. Recent work in real-time decision making has used machine learning
techniques to automatically construct reactive controllers, that is, controllers
with little...
Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are...
Air traffic flow management over the U.S. airpsace is a difficult problem. Current management approaches lead to hundreds of thousands of hours of delay, costing billions of dollars annually. Weather and airport conditions may instigate this delay, but routing decisions balancing delay with congestion contribute significantly to the propagation of...
Markov Decision Process (MDP) is a well-known framework for devising the optimal decision making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not...
Physical activity recognition using accelerometer data is a rapidly emerging field with many real-world applications. Much of the previous work in this area has assumed that the accelerometer data has already been segmented into pure activities, and the activity recognition task has been to classify these segments. In reality, activity...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
Novelty detection plays an important role in machine learning and signal processing. This
project studies novelty detection in a new setting where the data object is represented as
a bag of instances and associated with multiple class labels, referred to as multi-instance
multi-label (MIML) learning. Contrary to the common assumption...
Auctions are used to solve resource allocation problem between many agents and many items in real-world settings. Unfortunately, in most cases, it is possible for selfish agents to manipulate the system for their own interest at the expense of the social welfare. Such manipulation can be prevented using the Vickrey-Clarke-Groves...
Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. This thesis studies the problem of learning...
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...
Tensegrity structures are composed of pure compressional elements that are connected via a network of pure tensional elements. The concept of tensegrity promises numerous advantages to the field of robotics. Tensegrity robots are, however, notoriously difficult to control due to their oscillatory nature and nonlinear interaction between the components. Multiagent...
In real networks, identifying dense regions is of great importance. For example, in a network that represents academic collaboration, authors within the densest component of the graph tend to be the most prolific. Dense subgraphs often identify communities in social networks. And dense subgraphs can be used to discover regulatory...
The purpose of this article is to explore student reasoning with regard to problems in logic, particularly those related to the Principle of Mathematical Induction (PMI). The five case studies presented build off of work done by other researchers, most notably Dubinsky and Harel, who both looked at how students'...
Pedestrian distraction at roadway crossings has been correlated with a higher risk of pedestrian-vehicle collisions due to the pedestrian's cognitive, visual, and motor attention being drawn to a wide variety of secondary tasks.
This study is different from previous field studies of pedestrian midblock crossings in that the geometric layout...
Constructing a panorama from a set of videos is a long-standing problem in computer vision. A panorama represents an enhanced still-image representation of an entire scene captured in a set of videos, where each video shows only a part of the scene. Importantly, a panorama shows only the scene background,...
We investigate the data collection problem in sensor networks. The network consists of a number of stationary sensors deployed at different sites for sensing and storing data locally. A mobile element moves from sites to sites to collect data from the sensors periodically. There are different costs associated with the...
Writing a program that performs well in a complex environment is a challenging task. In such problems, a method of deterministic programming combined with reinforcement learning (RL) can be helpful. However, current systems either force developers to encode knowledge in very specific forms (e.g., state-action features), or assume advanced RL...
Automatic analysis of American football videos can help teams develop strategies and extract patterns with less human effort. In this work, we focus on the problem of automatically determining which team is on offense/defense, which is an important subproblem for higher-level analysis. While seemingly mundane, this problem is quite challenging...
Monte-Carlo Tree Search (MCTS) is an online-planning algorithm for decision-theoretic planning in domains with stochastic and combinatorial structure. The general applicability of MCTS makes it an ideal first choice to investigate when developing planners for complex applications requiring automated control and planning. The first contribution of this thesis is to...
Direct red Solid Oxide Fuel Cell (SOFC) Turbine hybrid plants have the potential to dramatically increase power plant efficiency, decrease emissions, and provide fast response to transient loads. The US Department of Energy's (DOE) Hybrid Performance Project is an experimental hybrid SOFC plant, built at the National
Energy Technology Laboratory...
The study of variational typing originated from the problem of type inference for variational programs, which encode numerous different but related plain programs. In this dissertation, I present a sound and complete type inference algorithm for inferring types of all plain programs encoded in variational programs. The proposed algorithm runs...
Through passive adaptation to incidental flow, flexible aerodynamic surfaces exploit effects of increased lift, delayed stall and disturbance rejection. Wings of birds, bats, and insects exhibit these passive effects, and at the same time through the use of structural state feedback sensed from the loads on the wing, active control...
The Open Modeling Environment (OME) is a tool developed to address some known shortcomings in ecological System Dynamics (SD) modeling research. OME provides a common set of methods for interacting directly with spatial information, reducing the need for modelers to create their own methods for doing so. The environment is...
Machine learning systems are generally trained offline using ground truth data that has been labeled by experts. However, these batch training methods are not a good fit for many applications, especially in the cases where complete ground truth data is not available for offline training. In addition, batch methods do...
We model the popular board game of Clue as an MDP and evaluate Monte-Carlo policy rollout in a simulated environment pitting different agents and policies against each other. We describe the choices we made in the representation, along with some of the problems we encountered along the way. We find...
Autonomous multiagent teams can be used in complex exploration tasks to both expedite the exploration and improve the efficiency. However, use of multiagent systems presents additional challenges. Specifically, in domains where the agents' actions are tightly coupled, coordinating multiple agents to achieve cooperative behavior at the group level is difficult....
In this thesis, we will study certain generalizations of the classical Shannon Sampling Theorem, which allows for the reconstruction of a pi-band-limited, square-integrable function from its samples on the integers. J. R. Higgins provided a generalization where the integers can be perturbed by less than 1/4, which includes nonuniform and...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
With the development of technologies in genome sequencing and variant detection, a huge number of variants are detected. To further analyze the variants, it requires an efficient tool to annotate the functional effect of variants. This project managed to develop an efficient program to annotate the functional effect of variants...
This thesis focuses on the problem of object tracking. Given a video, the general objective of tracking is to track the location over time of one or more targets in the image sequence. This is a very challenging task as algorithms need to deal with problems such as appearance variations,...
Mutation testing is one of the effective approaches measuring test adequacy of test suites. It is widely used in both academia and industry. Unfortunately, the adoption and practical use of mutation testing for Python 2.x programs face three obstacles. First, limited useful mutation operators. Existing mutation testing tools support very...
Software testing is the process of evaluating the accuracy and performance of software, and automated software testing allows programmers to develop software more efficiently by decreasing testing costs. We compared two advanced random test generators, a Feedback-Directed Random Test Generator (FDR) and a Feedback-Controlled Random Test Generator (FCR), for an...
Monte Carlo tree search (MCTS) is a class of online planning algorithms for Markov decision processes (MDPs) and related models that has found success in challenging applications. In the online planning approach, the agent makes a decision in the current state by performing a limited forward search over possible futures...
Most tasks in natural language processing (NLP) try to map structured input (e.g., sentence or word sequence) to some form of structured output (tag sequence, parse tree, semantic graph, translated/paraphrased/compressed sentence), a problem known as “structured prediction”. While various learning algorithms such as the perceptron, maximum entropy, and expectation-maximization have...
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not...
The thesis focuses on activity recognition from sensor data, which has spurred a great deal of interest due to its impact on health care and security. Previous work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort...
This work is inspired by problems in natural resource management centered on the challenge of invasive species. Computing optimal management policies for maintaining ecosystem sustainable is challenging. Many ecosystem management problems can be formulated as MDP (Markov Decision Process) planning problems. In a simulator-defined MDP, the Markovian dynamics and rewards...
Appropriate representations of variational software simplify the analysis of their properties.This thesis proposes tailored representations of two kinds variational softwares: difference files of merge commits in Git and feature models. For the former, we use the Choice Edit Model, which is based on the choice calculus, to represent changes introduced...
Oxo-hydroxo Group 5 metal clusters are an untapped resource to study and advance aqueous solution processing of metal oxide thin films. The tetramethylammonium (TMA) hexatantalate salt (TMA6[H2Ta6O19]) yields dense Ta2O5 films (~95% of the bulk ß-Ta2O5 density) with atomically smooth surfaces (<4 Å root mean square surface roughness). This same...
In Intense Pulsed Light (IPL) sintering, pulsed large-area visible light from a xenon lamp is absorbed by nanoparticle films or patterns and converted to heat, resulting in rapid sintering of the nanoparticles. This work experimentally characterizes IPL sintering of silver nanoparticle films. A newly observed turning point in the evolution...
There is growing commercial interest in the use of unmanned aerial vehicles (UAVs) in urban environments, specifically for package delivery applications. However, the size, complexity and sheer numbers of expected UAVs makes conventional air traffic management that relies on human air traffic controllers infeasible. To enable UAVs to safely and...
Society faces many complex management problems, particularly in the area of shared public resources such as ecosystems. Existing decision making processes are often guided by personal experience and political ideology rather than state-of-the-art scientific understanding. This dissertation envisions a future in which multiple stakeholders are provided with computational tools for...
Although machine learning systems are often effective in real-world applications, there are situations in which they can be even better when provided with some degree of end user feedback. This is especially true when the machine learning system needs to customize itself to the end user's preferences, such as in...
Automatic event extraction from natural text is an important and challenging task for natural language understanding. Traditional event detection methods heavily rely on manually engineered rich features. Recent deep learning approaches alleviate this problem by automatic feature engineering. But such efforts, like tradition methods, have so far only focused on...
The Rust programming language is a systems programming language with a strong static type system. A central feature of Rust’s type system is its unique concept of “ownership”, which enables Rust to give a user safe, low-level control over resources without the overhead of garbage collection. In Haskell, most data...
Changes in the global climate and forest management practices have given rise to increasing numbers and severity of wildfires. More than five million acres burned in the United States in 2017, while in Canada 7.4 million acres burned. In particular, an increasing amount of dead woody biomass is a key...
Recent advancements in joining operations and additive manufacturing now allow complex metal parts to be built up from raw materials, as opposed to being machined down from solid blocks. This not only opens up the design space, but also allows for much more efficient manufacturing. By decomposing a complex part...