We developed and investigated machine learning methods that require
minimal preprocessing of the input data, use few training examples, run fast, and
still obtain high levels of accuracy.
Most approaches to designing machine learning programs are based on the
supervised learning paradigm – training examples are chosen randomly and given...
This report addresses the design and implementation of an internet-based grading tool for the "Translators" course. The motivation is to avoid exposing the instructor's Java byte-code to possible reverse-engineering tools and enable students to submit their homework virtually from any machine across the internet. This tool is intended to replace...
This thesis presents a case study of applying machine learning tools to build a predictive
model of annual infestations of grasshoppers in Eastern Oregon. The purpose of the
study was two-fold. First, we wanted to develop a predictive model. Second, we wanted
to explore the capabilities of existing machine learning...
Learning easily understandable decision rules from examples is one of the classic problems in machine learning. Most learning algorithms for this problem employ some variation of a greedy separate-and-conquer algorithm. In this paper, we describe a system called LERILS that learns highly accurate and comprehensible rules from examples using a...
In a typical method of database design, ER diagrams are used to represent the conceptual schema of the database. Use-cases are now widely used to capture requirements in designing the basic architecture of an object-oriented system. Use-cases are highly effective in designing forms required for data insertion, retrieval, update, and...
Forms are an easy-to-use interface to access a database, including a remote database on the Internet. An entity-relationship (ER} diagram, which is a pictorial representation of a database schema, is widely used in designing a database. A class diagram, which shows set of classes, relationships among them, and associations, are...
Proxy Pipe is designed to be a bus in the transport layer of the communication between the two different processes. Its major function is to transfer bytes. When the client process tries to send a command to the server process, it will talk to a proxy as if it were...
Djang et al. [1998] introduced a new model of types for declarative visual programming languages (VPLs). Implicit static typing is used in their type model, in order to eliminate the programming mechanisms associated with type declarations, provide immediate visual feedback with respect to type errors and guarantee type safe programs....
Hewlett-Packard (HP) is one of the world's largest computer companies and the foremost producer of test and measurement instruments. In Corvallis, Oregon, HP manufactures several precision products on high speed, automated assembly lines. The alignment process of a cap to a base part is one of the essential processes in...
Cryptographic systems are used for secure communications in the government, in industry, and by individuals. Many cryptographic systems base their security on our inability to factor quickly. For this reason, in the last few decades several fast factoring algorithms have been developed. The Quadratic Sieve is one of the best...
Software maintenance is an expensive part of the software lifecycle: estimates put its cost at up to two-thirds of the entire cost of software. Regression testing, which tests software after it has been modified to help assess and increase its reliability, is responsible for a large part of this cost....
The project aims at building an application that would simulate a transparency that can either be overlaid on top of the graphical display of another application or used as a stand-alone by accepting input from a keyboard or a mouse to enhance a presentation. The project provides an environment to...
We present a model for a distributed virtual market place that can be constructed on the Internet to support selling and buying requests, such as those found as classified advertisements. One requirement for a transaction to take place in the virtual market place is that a sell request and a...
Experimental test beds to study Real Time agents are common topic in AI. A principal contribution of this paper is to develop a framework for such experimental test beds. Design and implementation of the test bed and the extension of the framework for different equipment and agents are discussed. Results...
Prioritization techniques are used to schedule test cases to execute in a specific order to maximize some objective function. There are a variety of possible objective functions, such as a function that measures how quickly faults can be detected within the testing process, or a function that measures how fast...
Probabilistic inference using Bayesian networks is now a well-established approach for reasoning under uncertainty. Among many e ciency-driven tech- niques which have been developed, the Optimal Factoring Problem (OFP) is distinguished for presenting a combinatorial optimization point of view on the problem. The contribution of this thesis is to extend...
Data, buffering is an important mechanism to enhance performance of a database system. In this project, we studied the data buffering mechanism of the Oracle database system and created an Oracle data buffering simulation program in Java. The data obtained from the simulation program w re compared with the performance...
Serializability is unnecessarily strict for real-time systems because most transactions in such systems occur periodically and changes among data values over a few consecutive periods are often insignificant. Hence, data values produced within a short interval can be treated as if they are "similar" and interchangeable. This notion of similarity...
Visualization is a useful method to teach classes in algorithms and data structures. Many animation systems of algorithms and data structures have been implemented. In these systems, students can set initial conditions, run algorithms and view the result graphically. In this project, a small visualization tool library is implemented using...
Declarative visual programming languages (VPLs), including spreadsheets, make up a large portion of both research and commercial VPLs. Spreadsheets in particular enjoy a wide audience, including end users. Unfortunately, spreadsheets and most other declarative VPLs still suffer from some of the problems that have been solved in other languages, such...
Bayesian networks are used for building intelligent agents that act under uncertainty. They are a compact representation of agents' probabilistic knowledge. A Bayesian network can be viewed as representing a factorization of a full joint probability distribution into the multiplication of a set of conditional probability distributions. Independence of causal...
Multiparadigm programming languages are a recent development in the realm of programming languages. A multiparadigm programming language allows the use of multiple, differing programming paradigms without departing from a single, unified linguistic framework. Multiparadigm programming languages are claimed to have benefits to both pedagogy and complex application creation. The beneficial...
Many performance tuning tools for parallel software use visual representations of trace data to guide a developer towards code improvements. Most widely used visualization schemes, however, either omit useful information about time dependence of processor use, or include that information but do not scale well to long run times or...
With the strong shift toward network-centric cluster-computing workload management has become increasingly important in both industrial and academic environments. To fully harness the distributed computing resources of the 32 PC's in our department, I developed a workload monitoring system together with a few utilities for users to submit either sequential...
This research explores the hypothesis that methods from decision theory and machine learning can be combined to provide practical solutions to current manufacturing control problems. This hypothesis is explored by developing an integrated approach to solving one manufacturing problem - the optimization of die-level functional test.
An integrated circuit (IC)...
The visual programming language Forms/3 currently uses a graphical user interface implemented in Garnet. Garnet was developed by the User Interface Software Group in the Human Computer Interaction Institute at Carnegie Mellon University, but is no longer supported. This paper presents an implementation of a user interface for Forms/3 written...
The Distributed Testing Coordinator (DTC) is a system for control and coordination of the software testing processes at Rogue Wave. Some important features in this product include automatic resource configuration, centralized job scheduling, concurrent and distributed testing, failure recovery, and automatic event logging and test reporting.
DTC consists of a...
Like other scientific fields, computer science is in great need of experimentation to support, improve, disprove and even establish theories. One area in which experimental results are highly desired involves regression testing, which is performed on modified software to provide confidence that the software behaves correctly and that modifications have...
This paper describes the design and performance of a distributed, multi-tier architecture for scientific information management and data exploration. A novel aspect of this framework is its integration of Java IDL, the CORBA distributed object computing middleware with JavaBeans, the Java Component model to provide a flexible, interactive framework for...
Artificial Intelligence (AI) planning techniques have been central to automating a gamut of tasks from the mundane route planning and beer production to the ethereal image processing of space-ship images. Of all the planning techniques, hierarchical- decomposition planning has been the technique most employed in industrial-strength planners. Hierarchical-decomposition planning is...
In this report., we address the issues of translating MATLAB scripts into SPMD-style C programs. The resulting programs, when linked with our run-time library are suitable for execution on parallel computers. We describe the design of the compiler and improvements made to it in the current version. We also describe...
System dependence graphs, an interprocedural extension to program dependence graphs, are an important tool in the analysis of program source code, especially for performing program slicing upon that code. Given the complexity of most programs, and the difficulty of creating program slices, it is vital to create system dependence graphs...
Programming parallel machines has been a difficult and unrewarding task. The short lifespan of parallel machines and their incompatibility have made it difficult to utilize them. Our goal here is to create an environment for parallel computing which allows users to take advantage of parallel computers without writing parallel programs....
Remote Event Listener(REL) is designed to glue remote events and remote listeners dynamically, and dispatch remote events efficiently and transparently
for distributed object-oriented systems. Components can be independently developed and remotely interconnected with REL, and software reusability can
be improved. Remote Event Listener along with Remote Method Invocation
makes distributed...
In this research, we have captured, in pattern form, key elements of programming and design in four programming paradigms (imperative, object-oriented, functional and logical) as well as multiparadigm programming. These pattern sets have formed a foundation upon which we were able to build a deeper understanding of multiparadigm programming and...
While much work has been done in estimating software reliability, little attention is paid to predict reliability as early as at the design time. In this report, we present our initial research results of building an early stage software reliability prediction model.
In Part I, we will first investigate and...
We developed a scientific information management system to facilitate remote access and analysis of earth and space science data, using the Component Model of software development provided by the Java language. The data sets are part of the Earth Observing. System project, being carried out at the College of Oceanic...
Existing graphics systems are too large for students to study in an introductory computer graphics course. We have implemented a lightweight, object-oriented graphics system called OGS for instruction. OGS is written in Java. It demonstrates how a graphics system is implemented from scratch and is intended to help students understand...
The Java programming language is a recent entry in the family of object-oriented languages and like all new languages has yet to achieve widespread standardization of coding styles for the unique aspects it possesses. This paper and project address this issue by presenting a complete and rationalized style guide for...
I introduce a compositional approach to application software development. In this approach, an extended entity-relationship diagram (EERD), which represents the component types and the relationship types within an application domain, is used as a template of executable programs in that application domain. As we use structural active objects as the...
Parallel computers are classified into: Multiprocessors, and multicomputers. A multiprocessor system usually has a shared memory through which its processors can communicate. On the other hand, the processors of a multicomputer system communicate by message passing through an interconnection network. A widely used class of interconnection networks is the toroidal...
Visual programming languages employ visual representation to make programming easier and make programs more reliable and more accessible. Visual program testing becomes increasingly important as more and more visual programming languages and visual programming environments come into real use. In this work, we focus on one important class of visual...
Debanoir is a distributed, collaborative editor for building Bayesian networks. It is composed of a front-end editor client and back-end server programs. The editor client is a Java applet that can be invoked from any Internet browser that supports Java. The server, as of now, runs on the machine from...
The goal of this project was to investigate the addition of parameterized types to the Java programming language. Two different parametric polymorphism mechanisms were developed and compared. The first was a preprocessor and the second was a compiler.
Parameterized types allow a programmer to create generic programs. Much as a...
Biologists need tools to see the structural relationships encoded in biological sequences (strings). The Walking Tree heuristics calculate some of these relationships. I have designed and implemented graphic presentations which allow the biologist (user) to see these relations. This thesis contains background information on the biological sequences and some background...
Exception handling is a programming language feature that can help increase the reliability of programs. However, not much work has been done on exception handling in visual programming languages. We present an approach for improving the exception handling mechanism in Forms/3, a declarative visual programming language based on the spreadsheet...
As the World Wide Web continues to grow, people clearly want to do much more
with it than just publish static pages of text and graphics. While such increased interactivity
has traditionally been accomplished through the use of server-side CGI scripts,
much recent research on Web browsers has been on...
Little Smalltalk is a small, reasonably fast, easy-to-understand, easy-to-modify Smalltalk system. The system was originally developed in 1984 as a part of an implementation project to develop a minimal Smalltalk system, closely resembling Smalltalk-80. The system was developed in C. The current project is an experiment to port the Little...
Since its introduction in 1995 by Sun Microsystems, the Java programming language has been widely accepted by the software development community. Besides being a natural fit for Internet and World Wide Web (WWW) based applications, Java is also being used in other diverse application areas due to its simplicity, reduced...
CHARM is a parallel programming language that was originally implemented for a network of workstations each of which has only one processor. In this project, we ported CHARM for a network of workstations each of which has more than one processor (multi-computer) using multithreading to exploit the multiple processors.
Network...
Until now, attempts to extend the one-way constraint evaluation model of the spreadsheet paradigm to support complex objects, such as colored circles or user-defined types, have led to approaches featuring either a direct way of creating objects graphically or strong compatibility with the spreadsheet paradigm, but not both. This inability...
Many different types of distributed batch scheduling systems have been developed in the last decade to take advantage of the decentralization of computers and the enormous investments that many companies and educational institutions have in desktop workstations. Based on the premise that the majority of desktop workstations are significantly underutilized,...
Although much effort has been invested to build applications that support group work, collaborative applications have not found easy success. The cost of adopting and maintaining collaborative applications has prevented their widespread use, especially among small distributed groups. Application developers have had difficulties recognizing the extra effort required by groups...
Scientists in the biological sciences need to retrieve information from a variety of data collections, traditionally maintained in SQL databases, in order to conduct research. Because current assistant tools are designed primarily for business and financial users, scientists have been forced to use the notoriously difficult command-line SQL interface, supplied...
In a constant weight code, each code word contains a constant number of 1's. If this number is equal to half the length of the code word then the code is called balanced. These codes find many applications in computer and communication systems noise reduction in VLSI systems, fault masking...
Many parallel machines, both commercial and experimental, have been/are being designed with toroidal interconnection networks. For a given number of nodes, the torus has a relatively larger diameter, but better cost/performance tradeoffs, such as higher channel bandwidth, and lower node degree, when compared to the hypercube. Thus, the torus is...
We believe concreteness, direct manipulation and responsiveness in a visual programming language increase its usefulness. However, these characteristics present a challenge in generalizing programs for reuse, especially when concrete examples are used as one way of achieving concreteness. In this thesis, we present a technique to solve this problem by...
Data structures are essential for any programming task. Most of modem programming languages have a library of reusable data structures. In this project, Smalltalk-like collection classes have been implemented as a Java package. This package, collections, contains several useful data structures such as binary tree, B-Tree, bag, list, hash table...
The conventional way of debugging is to examine the program's state at various points during execution. With the addition of message passing as a means of communication and synchronization in message passing programs, the state of pending message operations also needs to be examined. Current parallel debuggers or analysis tools...
There is a software gap in parallel processing. The short lifespan and small installation base of parallel architectures have made it economically infeasible to develop platform-specific parallel programming environments that deliver performance and programmability. One obvious solution is to build architecture-independent programming environments. But the architecture independence usually comes at...
Existing parallel file systems are proving inadequate in two important arenas:
programmability and performance. Both of these inadequacies can largely be traced
to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented,
single-stream, block-at-a-time approach to file I/O. This one-size-fits-all
approach to parallel...
Reinforcement Learning (RL) is the study of learning agents that improve
their performance from rewards and punishments. Most reinforcement learning
methods optimize the discounted total reward received by an agent, while, in many
domains, the natural criterion is to optimize the average reward per time step. In this
thesis, we...
Distance learning enables students to study effectively from remote locations. With
televised lectures now commonplace, many disciplines can provide effective distance
learning. While the lecture format serves many disciplines, it does not serve those that
require laboratory work. Currently, there is little distance learning support for laboratory
work.
Second Best...
Culprit Tracking is a technique to make lazy evaluation in a programming language even lazier. We sought to develop such a technique after noting poorly-distributed performance characteristics of graphical user interfaces (GUIs) programmed in lazy languages. A characteristic aspect of GUI programs is the intensive screen I/O. These programs are...
Various ecological and hydrological models require estimates of the amount and spatial distribution of monthly and annual precipitation. PRISM is an analytical model that distributes point measurements of monthly, seasonal and annual precipitation to a geographic grid. In order to use this model effectively, good graphical user interfaces were needed....
The potential of the Internet-especially, the World Wide Web- as a medium for instruction has been realized. Numerous web-based courses have been developed in the recent past. However, most of these courses are nothing but glorified texts. The reasons being (a) lack of understanding of the pedagogic challenges of web-based...
In this paper, we describe a software mechanism, a software channel, that allows a group of distributed objects to communicate with each other automatically once they are connected to it. Software channels and predesigned distributed objects that are connected to them encapsulate the communication protocol and the network topology to...
Two types of parallel computers commonly used or solving large scientific problems are clusters of workstations and distributed-memory multicomputers. Each system has strengths and weaknesses for this task. Workstation clusters have a high performance to cost ratio and the advantage of the latest processors. Workstations are commonly under-utilized and can...
Supervised learning programs, such as decision tree learners and neural networks, often must learn Boolean functions. The concept being learned may not easily be expressed in terms of the atomic features given. Constructive induction automatically produces higher level features (combinations of the atomic features), which can improve learning performance. The...
When changes are made to programs - to enhance their functionality, to improve performance, to migrate from one language or machine to another, or to convert a serial program to parallel form - there is no easy way to verify that the changes maintain program correctness. Checking the final result...
Practical parallel programming demands that the details of distributing data to processors and inter- processor communication be managed by the compiler. These tasks quickly become too di cult for a programmer to do by hand for all but the simplest parallel programs. Yet, many parallel languages still require the programmer...
The most important part of parallel computation is communication. Except in the most embarassingly parallel examples, processors cannot work cooperatively to solve a problem unless they can communicate. One way to solve the problem of communication is to use an interconnection network. Processors are located at nodes of the network,...
A graph may he drawn in many different ways. We investigate how to draw a graph nicely, in the sense of being visually pleasing. We discuss the history of this field, and look at several algorithms for drawing graphs.
For planar graphs this problem has been algorithmically solved: that is,...
How might capabilities for algorithm animation be seamlessly integrated into a programming language that is both visual and declarative? Until now, visual programming language researchers have not attempted to answer that question, making the fruits of algorithm animation available only to users of textual programming languages. Users of visual programming...
Parallel computing on heterogeneous workstation clusters has proved to be a very efficient use of available resources, increasing their overall utilization. However, for it to be a viable alternative to expensive, dedicated parallel machines, a number of key issues need to be resolved. One of the major challenges of heterogeneous...
Error-correcting output coding (ECOC) is a method for converting a k-classsupervised learning problem into a large number L of two-class supervised learningproblems and then combining the results of these L evaluations. Previous researchhas shown that ECOC can dramatically improve the classi cation accuracy of supervisedlearning algorithms that learn to classify...
Exception handling is a general purpose tool found in many modern programming language, but most work to date is based on textual languages. We present an approach to exception handling in a new context, the visual programming language Forms/3. The visual nature of Forms/3 supports an effective approach to exception...
Preparing code for parallelization is one of the most time consuming programming tasks; approximately 24% of the program development activity is spend in this phase. In message passing systems, this is usually achieved by the addition of run-time message-passing library calls. The primitives provided by different message-passing libraries are similar,...
Multiparadigm languages are languages that are designed to support more than one style of programming. Leda is a strongly-typed multiparadigm programming language that supports imperative, functional, object-oriented, and logic programming. The constraint programming paradigm is a declarative style of programming where the programmer is able to state relationships among some...
Two prevalent models of parallel programming are data parallelism and task parallelism. Data parallelism is the simultaneous application of a single operation to a data set. This model fits best with regular computations. Task parallelism is the simultaneous application of possibly different operations to possibly different data sets. This fits...
Distance-based algorithms are machine learning algorithms that classify queries
by computing distances between these queries and a number of internally stored
exemplars. Exemplars that are closest to the query have the largest in
uence on
the classi cation assigned to the query. Two speci c distance-based algorithms, the
nearest neighbor...
Knowledge-Based Model Construction (KBMC) has generated a lot of attention
due to its importance as a technique for generating probabilistic or decision-theoretic
models whose range of applicability in AI has been vastly increased. However, no
one has tried to analyze the essential issues in KBMC, to determine if there exists...
The problem addressed in this paper is that of the extension of multimedia mechanism into Objex, an object oriented framework.
Several multimedia productions have been developed recently, such as QuickTime™, HyperCard, MacSpeaker, Multi-Ad Creator 3.0, etc. [Heid 91]. The common problems with them, however, are that they are all implemented...
Dataparallel C has been designed for various kinds of architectures and was developed jointly at University of New Hampshire and Oregon State University.
This project dealt with porting the libraries to the nCUBE 2 system and helping a user compile his programs on nCUBE 2 directly from a Sun workstation...
One of the main problems in debugging parallel programs is in determining how far a parallel program has executed when it crashes or stops. Even if the program dumps core files for all the parallel processes, the information is at such a low level that it is difficult to interpret...
In this paper we describe a fuzzy logic based control system implemented on a PC architecture. The rule based inference engine of this expert system is easily configured through an ascii text file and is demonstrated to be capable of controlling various simulations, including an inverted pendulum. The controller is...
A Kalman filter was developed to recover information on short period variations of ocean circulation from the satellite altimeter signals for the North Pacific Ocean. The ocean circulation at each grid point is specified by three variables: sea level height h, zonal velocity u, and meridional velocity v. In our...
Polyhedra are geometric representations of linear systems of equations and
inequalities. Since polyhedra are used to represent the iteration domains of nested
loop programs, procedures for operating on polyhedra can be used for doing loop
transformations and other program restructuring transformations which are needed
in parallelizing compilers. Thus a need...
Probabilistic inference in belief networks provides an effective way of reasoning under uncertainty. Efficiency is critical in applying this technique and many algorithms have been developed by many researchers. This is to report the object oriented design and implementation in C++ of such a probabilistic inference system using efficient algorithms.
We describe a CASE tool for designing hard real-time applications, called HaRTS. The design tool supports a hierarchical design diagram which combines the control and data flow of a hard real-time application. The design hierarchy separates a design into self-contained subdesigns. Yet, the design can be flatten to give you...