Protein secondary structure prediction plays a pivotal role in predicting protein folding in three-dimensions. Its task is to assign each residue one of the three secondary structure classes helix, strand, or random coil. This is an instance of the problem of sequential supervised learning in machine learning. This thesis describes...
The thesis focuses on model-based approximation methods for reinforcement
learning with large scale applications such as combinatorial optimization problems.
First, the thesis proposes two new model-based methods to stablize the
value–function approximation for reinforcement learning. The first one is the
BFBP algorithm, a batch-like reinforcement learning process which iterates between...
Software maintenance accounts for a large portion of the software development cost, particularly the process of updating programs either to adapt for requirement change or to enhance design or efficiency. Currently, program updates are generally performed manually by programmers using text editors. This is an unreliable
method because syntax and...
Accessing information on the Web has become ingrained into our daily lives, and we seek information from many different sources, including conference and journal publications, personal web pages, and others. Increasingly, web-based information retrieval systems such as web-based search engines, library on-line catalog systems, and subscription-based federated search systems are...
Domain-independent automated planning is concerned with computing a sequence of actions that can transform an initial state into a desired goal state. Resource production domains form an interesting class of such problems, in that they typically require reasoning about concurrent durative-actions with continuous effects while minimizing some cost function. Although...
Image feature detection and matching are two critical processes for many computer vision tasks. Currently, intensity-based local interest region detectors and local feature-based matching methods are used widely in computer vision applications. But in some applications, such as biological object recognition tasks, within-class changes in pose, lighting, color, and texture...
Current ACI design provisions for shear in reinforced concrete (RC) beams strengthened
with externally bonded carbon fiber-reinforced polymer (CFRP) U-wraps may not adequately account for the effect of scale. This paper describes tests of 6 geometrically scaled beams to identify possible scale effects of such beams. Results indicated that effective...
This thesis addresses the problem of learning dynamic Bayesian network (DBN) models to support reinforcement learning. It focuses on learning regression tree models of the conditional probability distributions of the DBNs. Existing algorithms presume that the stochasticity in the domain can be modeled as a deterministic function with additive noise....
End users develop more software than any other group of programmers, using software authoring devices such as e-mail filtering editors, by-demonstration macro builders, and spreadsheet environments. Despite this, there has been only a little research on finding ways to help these programmers with the dependability of the software they create....
We consider the problem of tactical assault planning in real-time strategy games where a team of friendly agents must launch an assault on an enemy. This problem offers many challenges including a highly dynamic and uncertain environment, multiple agents, durative actions, numeric attributes, and different optimization objectives. While the dynamics...
Automated recognition of object categories in images is a critical step for many real-world computer vision applications. Interest region detectors and region descriptors have been widely employed to tackle the variability of objects in pose, scale, lighting, texture, color, and so on. Different types of object recognition problems usually require...
Knowledge workers are struggling in the information flood. There is a growing interest in intelligent desktop environments that help knowledge workers organize their daily life. Intelligent desktop environments allow the desktop user to define a set of “activities” that characterize the user’s desktop work. These environments then attempt to identify...
Recently, the concept of performance based design has become popular for many types of structures, including port facilities under seismic loading. For the case of pile-supported wharves, the level of performance is generally estimated using the displacement capacity of the structure. Therefore, understanding soil-pile interaction is one of the most...
Ocean wave energy is a new and developing field of renewable energy with great potential. The energy contained in one meter of an average wave off the coast of Newport Oregon could supply dozens of homes with electricity. However, ocean waves are usually quite irregular which leads to large bursts...
The problem of document classification has been widely studied in machine learning and data mining. In document classification, most of the popular algorithms are based on the bag-of-words representation. Due to the high dimensionality of the bag-of-words representation, significant research has been conducted to reduce the dimensionality via different approaches....
This dissertation explores the idea of applying machine learning technologies to help computer users find information and better organize electronic resources, by presenting the research work conducted in the following three applications: FolderPredictor, Stacking Recommendation Engines, and Integrating Learning and Reasoning.
FolderPredictor is an intelligent desktop software tool that helps...
In wood-based composites, the glue-line (interface) between wood-strands affects the stress transfer from one member to the next. The glue-line properties determine the rate of load transfer between phases and these properties depend on wood species, surface preparation, glue properties, glue penetration into wood cells, and moisture content of the...
Monte-Carlo planning algorithms such as UCT make decisions at each step by
intelligently expanding a single search tree given the available time and then
selecting the best root action. Recent work has provided evidence that it can be
advantageous to instead construct an ensemble of search trees and make a...
Customer requirements and engineering specifications will influence the direction of the design in any product, so it is crucial to make the requirements as stable as possible so quality designs can be produced on time within a budget. It would be optimal for requirements and specifications to remain constant but...
This dissertation explores algorithms for learning ranking functions to efficiently solve search problems, with application to automated planning. Specifically, we consider the frameworks of beam search, greedy search, and randomized search, which all aim to maintain tractability at the cost of not guaranteeing completeness nor optimality. Our learning objective for...
A thesis presented on the determination of factors related to deposition, clearance, and dosimetry of the ICRP 30 lung model as related to smoking. Variations in calculations used to determine radiation doses leave a need to determine a standard method to determine absorbed dose from smoking. Standard parameters are used...
Since free riders in P2P network reduce the system's performance, how to maintain and encourage the nodes' cooperation is an important aspect of P2P related research. In this thesis, a P2P system is modeled based on two games: stag hunt game and snowdrift game. To relate the model to the...
The first edition of the Highway Safety Manual (HSM) provides a quantitative
approach to predict the safety of transportation facilities based on the recently
developed scientific methods. This approach, known as the predictive method, was
developed for several states in the United States. Due to differences in driver
population, weather...
Environmental and regulatory concerns are causing confined animal feeding operations (CAFO's) to account for phosphorous content when applying wastewater to agricultural fields for disposal. In most cases this requires more land to spread the waste onto so that the phosphorous needs of the crop are not exceeded. A mobile process...
Microchannel process technology (MPT) offers several advantages to the field of nanomanufacturing: 1) improved process control over very short time intervals owing to shorter diffusional distances; and 2) reduced reactor size due to high surface area to volume ratios and enhanced heat and mass transfer. The objective of this thesis...
The appropriate separation of concerns is a fundamental engineering principle. A concern, for software developers, is that which must be represented by code in a program; by extension, separation of concerns is the ability to represent a single concern in a single appropriate programming language construct. Advanced separation of concerns...
Image segmentation continues to be a fundamental problem in computer vision and image understanding. In this thesis, we present a Bayesian network that we use for object boundary detection in which the MPE (most probable explanation) before any evidence can produce multiple non-overlapping, non-self-intersecting closed contours and the MPE with...
We consider the problem of strategic adversarial planning in a Real-Time Strategy (RTS) game. Strategic adversarial planning is the generation of a network of high-level tasks to satisfy goals while anticipating an adversary's actions. In this thesis we describe an abstract state and action space used for planning in an...
In this work, I examine the problem of understanding American football in video. In particular, I present several mid-level computer vision algorithms that each accomplish a different sub-task within a larger system for annotating, interpreting, and analyzing collections of American football video. The analysis of football video is useful in...
Regression testing is an expensive software engineering activity intended to provide confidence that modifications to a software system have not introduced faults. Test case prioritization techniques help to reduce regression testing cost by ordering test cases in a way that better achieves testing objectives. In this thesis, we are interested...
Recent work has shown that AdaBoost can be viewed as an algorithm that maximizes the margin on the training data via functional gradient descent. Under this interpretation, the weight computed by AdaBoost, for each hypothesis generated, can be viewed as a step size parameter in a gradient descent search. Friedman...
Regression testing is an expensive testing process used to validate changes made to previously tested software. Different regression testing techniques can have different impacts on the cost-effectiveness of testing. This cost-effectiveness can also vary with different characteristics of test suites. One such characteristic, test suite granularity, reflects the way in...
We have developed a web-based software tool for evaluating the potential risk of pesticides on the surrounding environment. The software tool is called Web-based Pesticide Screening Tool (Web-PST). It uses the formulas and standards specified by the Soil/Pesticide Interaction Screening Procedure Version II (SPISP II). Web-PST closely models the stand-alone...
Revised Universal Soil Loss Equation (RUSLE) is a standard for estimating soil loss caused by rainfall and overland flow. The current software tool that implements this standard is RUSLE 1.06b, which is a stand-alone DOS application. In this project, we converted the DOS application to a Web application, which we...
"Collaborative filtering algorithms’ performances have been evaluated using a variety of metrics.
These metrics, such as Mean Absolute Error and Precision, have often ignored recommendations for
which they do not have data. Ignoring these recommendations has provided numbers which do not
accurately represent the user experience. Qualitatively we have seen...
Remote sensing is the most practical way to acquire large amounts of land cover data for monitoring and understanding environmental change, so it is important to be able to map land cover from imagery. Maps defining land cover patches as polygons rather than pixels greatly improve processing efficiency in models...
Bayesian Optimization (BO) methods are often used to optimize an unknown function f(•) that is costly to evaluate. They typically work in an iterative manner. In each iteration, given a set of observation points, BO algorithms select k ≥ 1 points to be evaluated. The results of those points are...
Microstructure changes in uranium and uranium/metal alloys due to radiation damage are of great interest in nuclear science and engineering. Titanium has attracted attention because of its similarity to Zr. It has been proposed for use in the second generation of fusion reactors due to its resistance to radiation-induced swelling....
Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are...
Air traffic flow management over the U.S. airpsace is a difficult problem. Current management approaches lead to hundreds of thousands of hours of delay, costing billions of dollars annually. Weather and airport conditions may instigate this delay, but routing decisions balancing delay with congestion contribute significantly to the propagation of...
Markov Decision Process (MDP) is a well-known framework for devising the optimal decision making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not...
Physical activity recognition using accelerometer data is a rapidly emerging field with many real-world applications. Much of the previous work in this area has assumed that the accelerometer data has already been segmented into pure activities, and the activity recognition task has been to classify these segments. In reality, activity...
Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute...
Novelty detection plays an important role in machine learning and signal processing. This
project studies novelty detection in a new setting where the data object is represented as
a bag of instances and associated with multiple class labels, referred to as multi-instance
multi-label (MIML) learning. Contrary to the common assumption...
Auctions are used to solve resource allocation problem between many agents and many items in real-world settings. Unfortunately, in most cases, it is possible for selfish agents to manipulate the system for their own interest at the expense of the social welfare. Such manipulation can be prevented using the Vickrey-Clarke-Groves...
Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. This thesis studies the problem of learning...
This thesis considers the problem in which a teacher is interested in teaching action policies to computer agents for sequential decision making. The vast majority of policy
learning algorithms o er teachers little flexibility in how policies are taught. In particular,
one of two learning modes is typically considered: 1)...
Tensegrity structures are composed of pure compressional elements that are connected via a network of pure tensional elements. The concept of tensegrity promises numerous advantages to the field of robotics. Tensegrity robots are, however, notoriously difficult to control due to their oscillatory nature and nonlinear interaction between the components. Multiagent...
In real networks, identifying dense regions is of great importance. For example, in a network that represents academic collaboration, authors within the densest component of the graph tend to be the most prolific. Dense subgraphs often identify communities in social networks. And dense subgraphs can be used to discover regulatory...
The purpose of this article is to explore student reasoning with regard to problems in logic, particularly those related to the Principle of Mathematical Induction (PMI). The five case studies presented build off of work done by other researchers, most notably Dubinsky and Harel, who both looked at how students'...