Probabilistic inference using Bayesian networks is now a well-established approach for reasoning under uncertainty. Among many e ciency-driven tech- niques which have been developed, the Optimal Factoring Problem (OFP) is distinguished for presenting a combinatorial optimization point of view on the problem. The contribution of this thesis is to extend...
Practical parallel programming demands that the details of distributing data to processors and inter- processor communication be managed by the compiler. These tasks quickly become too di cult for a programmer to do by hand for all but the simplest parallel programs. Yet, many parallel languages still require the programmer...
Distance-based algorithms are machine learning algorithms that classify queries
by computing distances between these queries and a number of internally stored
exemplars. Exemplars that are closest to the query have the largest in
uence on
the classi cation assigned to the query. Two speci c distance-based algorithms, the
nearest neighbor...
We developed and investigated machine learning methods that require
minimal preprocessing of the input data, use few training examples, run fast, and
still obtain high levels of accuracy.
Most approaches to designing machine learning programs are based on the
supervised learning paradigm – training examples are chosen randomly and given...
As the volume of genetic sequence data increases due to improved sequencing techniques and increased interest, the computational tools available to analyze the data are becoming inadequate. This thesis seeks to improve a few of the computational methods available to access and analyze data in the genetic sequence databases. The...
Multiparadigm programming languages are a recent development in the realm of programming languages. A multiparadigm programming language allows the use of multiple, differing programming paradigms without departing from a single, unified linguistic framework. Multiparadigm programming languages are claimed to have benefits to both pedagogy and complex application creation. The beneficial...
In this research, we have captured, in pattern form, key elements of programming and design in four programming paradigms (imperative, object-oriented, functional and logical) as well as multiparadigm programming. These pattern sets have formed a foundation upon which we were able to build a deeper understanding of multiparadigm programming and...
Learning easily understandable decision rules from examples is one of the classic problems in machine learning. Most learning algorithms for this problem employ some variation of a greedy separate-and-conquer algorithm. In this paper, we describe a system called LERILS that learns highly accurate and comprehensible rules from examples using a...
Remote Event Listener(REL) is designed to glue remote events and remote listeners dynamically, and dispatch remote events efficiently and transparently
for distributed object-oriented systems. Components can be independently developed and remotely interconnected with REL, and software reusability can
be improved. Remote Event Listener along with Remote Method Invocation
makes distributed...
The visual programming language Forms/3 currently uses a graphical user interface implemented in Garnet. Garnet was developed by the User Interface Software Group in the Human Computer Interaction Institute at Carnegie Mellon University, but is no longer supported. This paper presents an implementation of a user interface for Forms/3 written...
This paper describes the design and performance of a distributed, multi-tier architecture for scientific information management and data exploration. A novel aspect of this framework is its integration of Java IDL, the CORBA distributed object computing middleware with JavaBeans, the Java Component model to provide a flexible, interactive framework for...
Although standard tools have been used for lexical and syntactic analysis since the late 1970's, no
standard tools exist for the remaining parts of a compiler. Part of the reason for this de ciency is due to
the di culty of producing elegant tools capable of handling the large amount...
Programming parallel machines has been a difficult and unrewarding task. The short lifespan of parallel machines and their incompatibility have made it difficult to utilize them. Our goal here is to create an environment for parallel computing which allows users to take advantage of parallel computers without writing parallel programs....
This thesis presents a case study of applying machine learning tools to build a predictive
model of annual infestations of grasshoppers in Eastern Oregon. The purpose of the
study was two-fold. First, we wanted to develop a predictive model. Second, we wanted
to explore the capabilities of existing machine learning...
The task of mapping spelled English words into strings of phonemes and stresses ("reading aloud") has many practical applications. Several commercial systems perform this task by applying a knowledge base of expert-supplied letter-to-sound
rules. This dissertation presents a set of machine learning methods for automatically constructing letter-to-sound rules by analyzing...
Prioritization techniques are used to schedule test cases to execute in a specific order to maximize some objective function. There are a variety of possible objective functions, such as a function that measures how quickly faults can be detected within the testing process, or a function that measures how fast...
Software maintenance is an expensive part of the software lifecycle: estimates put its cost at up to two-thirds of the entire cost of software. Regression testing, which tests software after it has been modified to help assess and increase its reliability, is responsible for a large part of this cost....
This research explores the hypothesis that methods from decision theory and machine learning can be combined to provide practical solutions to current manufacturing control problems. This hypothesis is explored by developing an integrated approach to solving one manufacturing problem - the optimization of die-level functional test.
An integrated circuit (IC)...
Djang et al. [1998] introduced a new model of types for declarative visual programming languages (VPLs). Implicit static typing is used in their type model, in order to eliminate the programming mechanisms associated with type declarations, provide immediate visual feedback with respect to type errors and guarantee type safe programs....
Bayesian networks are used for building intelligent agents that act under uncertainty. They are a compact representation of agents' probabilistic knowledge. A Bayesian network can be viewed as representing a factorization of a full joint probability distribution into the multiplication of a set of conditional probability distributions. Independence of causal...
Serializability is unnecessarily strict for real-time systems because most transactions in such systems occur periodically and changes among data values over a few consecutive periods are often insignificant. Hence, data values produced within a short interval can be treated as if they are "similar" and interchangeable. This notion of similarity...
Parallel computers are classified into: Multiprocessors, and multicomputers. A multiprocessor system usually has a shared memory through which its processors can communicate. On the other hand, the processors of a multicomputer system communicate by message passing through an interconnection network. A widely used class of interconnection networks is the toroidal...
Biologists need tools to see the structural relationships encoded in biological sequences (strings). The Walking Tree heuristics calculate some of these relationships. I have designed and implemented graphic presentations which allow the biologist (user) to see these relations. This thesis contains background information on the biological sequences and some background...
Exception handling is a programming language feature that can help increase the reliability of programs. However, not much work has been done on exception handling in visual programming languages. We present an approach for improving the exception handling mechanism in Forms/3, a declarative visual programming language based on the spreadsheet...
Visual programming languages employ visual representation to make programming easier and make programs more reliable and more accessible. Visual program testing becomes increasingly important as more and more visual programming languages and visual programming environments come into real use. In this work, we focus on one important class of visual...
Declarative visual programming languages (VPLs), including spreadsheets, make up a large portion of both research and commercial VPLs. Spreadsheets in particular enjoy a wide audience, including end users. Unfortunately, spreadsheets and most other declarative VPLs still suffer from some of the problems that have been solved in other languages, such...
I introduce a compositional approach to application software development. In this approach, an extended entity-relationship diagram (EERD), which represents the component types and the relationship types within an application domain, is used as a template of executable programs in that application domain. As we use structural active objects as the...
Many different types of distributed batch scheduling systems have been developed in the last decade to take advantage of the decentralization of computers and the enormous investments that many companies and educational institutions have in desktop workstations. Based on the premise that the majority of desktop workstations are significantly underutilized,...
In a constant weight code, each code word contains a constant number of 1's. If this number is equal to half the length of the code word then the code is called balanced. These codes find many applications in computer and communication systems noise reduction in VLSI systems, fault masking...
Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a relatively larger diameter, but better cost/performance tradeoffs, such as higher channel bandwidth, and lower node degree, when compared to the hypercube. Thus, the torus is...
A graph may he drawn in many different ways. We investigate how to draw a graph nicely, in the sense of being visually pleasing. We discuss the history of this field, and look at several algorithms for drawing graphs.
For planar graphs this problem has been algorithmically solved: that is,...
As the World Wide Web continues to grow, people clearly want to do much more
with it than just publish static pages of text and graphics. While such increased interactivity
has traditionally been accomplished through the use of server-side CGI scripts,
much recent research on Web browsers has been on...
Until now, attempts to extend the one-way constraint evaluation model of the spreadsheet paradigm to support complex objects, such as colored circles or user-defined types, have led to approaches featuring either a direct way of creating objects graphically or strong compatibility with the spreadsheet paradigm, but not both. This inability...
There is a software gap in parallel processing. The short lifespan and small installation base of parallel architectures have made it economically infeasible to develop platform-specific parallel programming environments that deliver performance and programmability. One obvious solution is to build architecture-independent programming environments. But the architecture independence usually comes at...
We believe concreteness, direct manipulation and responsiveness in a visual programming language increase its usefulness. However, these characteristics present a challenge in generalizing programs for reuse, especially when concrete examples are used as one way of achieving concreteness. In this thesis, we present a technique to solve this problem by...
Existing parallel file systems are proving inadequate in two important arenas:
programmability and performance. Both of these inadequacies can largely be traced
to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented,
single-stream, block-at-a-time approach to file I/O. This one-size-fits-all
approach to parallel...
Scientists in the biological sciences need to retrieve information from a variety of data collections, traditionally maintained in SQL databases, in order to conduct research. Because current assistant tools are designed primarily for business and financial users, scientists have been forced to use the notoriously difficult command-line SQL interface, supplied...
Although much effort has been invested to build applications that support group work, collaborative applications have not found easy success. The cost of adopting and maintaining collaborative applications has prevented their widespread use, especially among small distributed groups. Application developers have had difficulties recognizing the extra effort required by groups...
Parallel computing on heterogeneous workstation clusters has proved to be a very efficient use of available resources, increasing their overall utilization. However, for it to be a viable alternative to expensive, dedicated parallel machines, a number of key issues need to be resolved. One of the major challenges of heterogeneous...
Reinforcement Learning (RL) is the study of learning agents that improve
their performance from rewards and punishments. Most reinforcement learning
methods optimize the discounted total reward received by an agent, while, in many
domains, the natural criterion is to optimize the average reward per time step. In this
thesis, we...
Supervised learning programs, such as decision tree learners and neural networks, often must learn Boolean functions. The concept being learned may not easily be expressed in terms of the atomic features given. Constructive induction automatically produces higher level features (combinations of the atomic features), which can improve learning performance. The...
Distance learning enables students to study effectively from remote locations. With
televised lectures now commonplace, many disciplines can provide effective distance
learning. While the lecture format serves many disciplines, it does not serve those that
require laboratory work. Currently, there is little distance learning support for laboratory
work.
Second Best...
Culprit Tracking is a technique to make lazy evaluation in a programming language even lazier. We sought to develop such a technique after noting poorly-distributed performance characteristics of graphical user interfaces (GUIs) programmed in lazy languages. A characteristic aspect of GUI programs is the intensive screen I/O. These programs are...
Multiparadigm languages are languages that are designed to support more than one style of programming. Leda is a strongly-typed multiparadigm programming language that supports imperative, functional, object-oriented, and logic programming. The constraint programming paradigm is a declarative style of programming where the programmer is able to state relationships among some...
The most important part of parallel computation is communication. Except in the most embarassingly parallel examples, processors cannot work cooperatively to solve a problem unless they can communicate. One way to solve the problem of communication is to use an interconnection network. Processors are located at nodes of the network,...
How might capabilities for algorithm animation be seamlessly integrated into a programming language that is both visual and declarative? Until now, visual programming language researchers have not attempted to answer that question, making the fruits of algorithm animation available only to users of textual programming languages. Users of visual programming...
Two prevalent models of parallel programming are data parallelism and task parallelism. Data parallelism is the simultaneous application of a single operation to a data set. This model fits best with regular computations. Task parallelism is the simultaneous application of possibly different operations to possibly different data sets. This fits...
Knowledge-Based Model Construction (KBMC) has generated a lot of attention
due to its importance as a technique for generating probabilistic or decision-theoretic
models whose range of applicability in AI has been vastly increased. However, no
one has tried to analyze the essential issues in KBMC, to determine if there exists...
Parallel loops are one of the main sources of parallelism in scientific applications, and many parallel loops do not have a uniform iteration execution time. To achieve good performance for such applications on a parallel computer, iterations of a parallel loop have to be assigned to processors in such a...
Reasoning about any realistic domain always involves a degree of uncertainty.
Probabilistic inference in belief networks is one effective way of reasoning under
uncertainty. Efficiency is critical in applying this technique, and many researchers
have been working on this topic. This thesis is the report of our research in this...
First, we study hard real-time scheduling problems where each task is defined by a four tuple (r, c, p, d): r being its release time, c computation time, p period, and d deadline. The question is whether all tasks can meet their deadlines on one processor. If not, how many...
An important problem in molecular biology is to predict the secondary
structure of proteins from their primary structure. The primary structure of a
protein is the sequence of amino acid residues. The secondary structure is an
abstract description of the shape of the folded protein, with regions identified
as alpha...
The coverage of a learning algorithm is the number of concepts that can be learned by that algorithm from samples of a given size for given accuracy and confidence parameters. This thesis begins by asking whether good learning algorithms can be designed by maximizing their coverage. There are three questions...
The goal of Inductive Learning is to produce general rules from a set of seen examples, which can then be applied to other unseen examples. ID3 is an inductive learning algorithm that can be used for the classification task. The input to the algorithm is a set of tuples of...
Guidelines for using style to improve computer program comprehension
have often been proposed without empirical testing. This thesis reports on the
results of three controlled experiments that investigated ways program style may be
used to aid comprehension of source code listings.
Experiments 1 and 2 were conducted using advanced computer...
Knowledge compilation improves search-intensive problem-solvers that are easily specified but inefficient. One promising approach improves efficiency by constructing a database of problem-instance/best-action pairs that replace problem-solving search with efficient lookup. The database is constructed by reverse enumeration- expanding the complete search space backwards, from the terminal problem instances. This approach...
The general problem of application development of interactive GUI applications has been addressed by toolkits, libraries, user interface management systems, and more recently domain-specific application frameworks. However, the most sophisticated solution offered by frameworks still lacks a number of features which are addressed by this research: 1) limited functionality --...
An active object system is a transition-based object-oriented system suitable for the design of various concurrent systems. An AOS consists of a collection of interacting objects, where the behavior of each object is determined by the transition statements provided in the class of that object. A transition statement is a...
Many important application problems in engineering can be formalized as nonlinear
optimization tasks. However, numerical methods for solving such problems
are brittle and do not scale well. For example, these methods depend critically
on choosing a good starting point from which to perform the optimization search.
In high-dimensional spaces, numerical...
This paper investigates the feasibility of compiling the functionality of a decision
theoretic problem solving engine into a set of rules or functionally similar construct.
The decision theoretic engine runs in exponential time, while the rule set runs in
linear time at worst. The main question that will determine the...
The data used for the construction of genome maps is imperfect, therefore the mapping of a physically linear structure must take place in a very uneven feature space. As the number of genes to be ordered grows, it appears to be impractical to use exhaustive search techniques to find the...
The practice of literate programming is not widespread because existing literate programming systems have some undesirable characteristics such as programming language and text processor dependence and lack of flexible tools for viewing and manipulation of the source file. This dissertation describes the literate programming system AOPS (Abstraction Oriented Programming System)...
Index size savings from three techniques are measured. The three techniques are: 1) eliminating common, low information words found in a "stop list" (such as: of, the, at, etc.), 2) truncating terms by eliminating word stems (such as: -s, -ed, -ing, etc.), and 3) simple data compression. Savings are measured...
In real-time control systems, the value of a control decision depends not
only on the correctness of the decision but also on the time when that decision
is available. Recent work in real-time decision making has used machine learning
techniques to automatically construct reactive controllers, that is, controllers
with little...
Polyhedra are geometric representations of linear systems of equations and
inequalities. Since polyhedra are used to represent the iteration domains of nested
loop programs, procedures for operating on polyhedra can be used for doing loop
transformations and other program restructuring transformations which are needed
in parallelizing compilers. Thus a need...
This thesis compares two methods for studying the problem-solving processes of
mechanical design engineers. The first method, verbal protocol analysis, was applied
by L. Stauffer to construct a problem-solving model of mechanical design. The
second method, timing analysis, measures the time intervals separating drawing or
speaking actions during the design...
Generalized Radial Basis Functions were used to construct networks
that learn input-output mappings from given data. They are
developed out of a theoretical framework for approximation based
on regularization techniques and represent a class of three-layer
networks similar to backpropagation networks with one hidden
layer.
A network using Gaussian base...
There are three families of exact methods used for probabilistic inference in
belief nets. It is necessary to compare them and analyze the advantages and
the disadvantages of each algorithm, and know the time cost of making
inferences in a given belief network. This paper discusses the factors that
influence...
A key aspect of how we understand the world revolves around an ability to
manipulate our surroundings to experiment. From the scientific method through
theories of child development, the ability to experiment is deemed critical; however,
few studies have been performed to understand the strengths and weaknesses of
different experimental...
The problem of handling dependent evidence is an important practical issue for applications of reasoning with uncertainty in artificial intelligence. The existing solutions to the problem are not satisfactory because of their ad hoc nature, complexities, or limitations. In this dissertation, we develop a general framework that can be used...
Balanced codes, in which each codeword contains equally many 1's and 0's, are useful in such applications as in optical transmission and optical recording. When balanced codes are used, the same number of 1's and 0's pass through the channel after the transmission of every word, so the channel is...
Artificial Intelligence (AI) planning techniques have been central to automating a gamut of tasks from the mundane route planning and beer production to the ethereal image processing of space-ship images. Of all the planning techniques, hierarchical- decomposition planning has been the technique most employed in industrial-strength planners. Hierarchical-decomposition planning is...
Error-correcting output coding (ECOC) is a method for converting a k-classsupervised learning problem into a large number L of two-class supervised learningproblems and then combining the results of these L evaluations. Previous researchhas shown that ECOC can dramatically improve the classi cation accuracy of supervisedlearning algorithms that learn to classify...
Multicomputers and multiprocessors, first introduced in the mid 1980s, employ large numbers of microprocessors working in parallel to achieve high performance at low cost. These parallel machines represent an exciting new generation in supercomputing, but their usefulness is currently limited by inadequate programming languages and environments. This problem is being...
There are lots of mailing systems available for Apple Macintosh computers. But when this research was started, there were no Voice Mail System for the Macintosh. However, a similar system was available for the NeXT machine. So the main goal of this research was to develop a Voice Mail System...
Software development has been characterized by severe schedule slippage, cost overrun and the inability of the developer to estimate with acceptable accuracy the resources and schedule required early in the requirements analysis and functional design phase when critical investment decisions must be made. This estimation difficulty has emerged as one...
Software applications that successfully employ a grap ica user interface to display and manipulate their data are recognized as easy to use and difficult to implement. While the complexity of such applications has increased dramatically, the tools available to application designers have not advanced at the same pace. Current tools...
The conventional way of debugging is to examine the program's state at various points during execution. With the addition of message passing as a means of communication and synchronization in message passing programs, the state of pending message operations also needs to be examined. Current parallel debuggers or analysis tools...
Proxy Pipe is designed to be a bus in the transport layer of the communication between the two different processes. Its major function is to transfer bytes. When the client process tries to send a command to the server process, it will talk to a proxy as if it were...
In this paper we outline an implementation of Linda on a network of Unix workstations. A literature survey was done to gain a better perspective on state of the art and to learn from the experiences of other implementations. The tuple space which is central to the Linda system is...
Capacity Planning is a gradually recognized need in UNIX environments. However, the effort to address this problem has been more or less event driven and is often discarded after crises are over. The main cause to this situation is that the tools to support the CP function in UNIX environments...
Forms are an easy-to-use interface to access a database, including a remote database on the Internet. An entity-relationship (ER} diagram, which is a pictorial representation of a database schema, is widely used in designing a database. A class diagram, which shows set of classes, relationships among them, and associations, are...
We have developed SearchPak, a machine independent parallel searching tool on shared and distributed memory machines. It can be used for combinatorial optimization problems and OR-parallel computations as well. Both depth-first and best-first search of the state space can be performed using the SearchPak. With SearchPak, a user just provides...
Hewlett-Packard (HP) is one of the world's largest computer companies and the foremost producer of test and measurement instruments. In Corvallis, Oregon, HP manufactures several precision products on high speed, automated assembly lines. The alignment process of a cap to a base part is one of the essential processes in...
Oregon SpeedCode Universe (OSU) is a rapid prototyping system which automates the production of applications from a description of the user interface. GraphLab is a domain-specific tool which deals with several major problems concerning OSU: (1) limited functionality, that is, lacking graphical capabilities in generated applications, (2) indirect approach to...
We present a model for a distributed virtual market place that can be constructed on the Internet to support selling and buying requests, such as those found as classified advertisements. One requirement for a transaction to take place in the virtual market place is that a sell request and a...
In this project we have implemented an Object-Oriented simulation model of a Digital Switching System. We analyzed various software components of a Digital Switching System. The call processing module is the principal component and the most complex of all. The features of Object-Oriented principles were investigated and found to fit...