Published January 1931. Facts and recommendations in this publication may no longer be valid. Please look for up-to-date information in the OSU Extension Catalog: http://extension.oregonstate.edu/catalog
Published April 1963. Facts and recommendations in this publication may no longer be valid. Please look for up-to-date information in the OSU Extension Catalog: http://extension.oregonstate.edu/catalog
Summary: Four proposed metrics:
[1] average relative reduction in training time (sample size, number of training experiences)
[2] jumpstart (initial advantage of transfer algorithm)
[3] handicap (how long it takes the no-transfer algorithm to overcome the jumpstart)
[4] asymptotic advantage (how much better the transfer learning algorithm does in the...
Collaborative filtering (CF) algorithms are used in a wide range of internet applications. However the chief objective of using CF algorithms across most of these applications is to discover items that might be of interest to its users. CF algorithms work by obtaining feedback from users on the items that...
This paper surveys the activity recognition task from a machine learning perspective. I give a definition of this problem, and I classify different activity recognition problems into two categories. I show the activities can be hierarchical, and based on such hierarchies I synthesize a language to describe activities. I give...
In 1995 my students and I developed Leda, a multiparadigm language based on the Pascal model. Leda allowed programmers to create abstractions in an object-oriented, functional, or logic programming style. More recently we have been interested in recreating this work, but this time using Java as the language basis. The...
A diagnostic policy species what test to perform next based on the results of previous tests and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the costs of tests and (b) the costs of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost....
What is the relationship between learning and reasoning? Much recent work in machine learning has been criticized for focusing on learning and ignoring reasoning. This paper attempts to describe the various ways in which machine learning research has (and has not) incorporated reasoning. The paper argues that there are important...
"The specifi c problem addressed in this proposal is the development of
good approximation algorithms for solving problems that have partial observability. The model we propose associates costs with obtaining information about the current state. We want to predict when and how much it is necessary to observe. We want...
"Fifty years after the pioneering work of McCulloch and Pitts, the study of neural nets is alive and active. In this paper, I have discussed some of the work that is of current interest to me and my co-workers. I would, perhaps, be remiss if I failed to mention some...
Brief descriptions of the I/O requirements for four production oceanography programs running at Oregon State University are presented. The applications all rely exclusively on array-oriented, sequential file operations. Persistent files are used for checkpointing and movie making, while temporary files are used to store out-of-core data.
This note briefly discusses some of the classical results of McCulloch and Pitts. It then deals with some current research in neural nets. Several questions about neural nets are shown to be computationally difficult by showing that they are NP-Complete or worse. The size of neural nets necessary to compute...
The lambda calculus has frequently been used as an intermediate representation for programming languages, particularly for functional programming language systems. We introduce two simple extensions to the lambda calculus that describe potentially parallel computations. These extensions permit us to use the lambda calculus as an intermediate form for languages that...
Multiparadigm programming is a term used to describe a style of software development that makes use of facilities originally designed in support of a number of different programming language paradigms. In this paper we illustrate our conception of multiparadigm programming, by describing how various data structures can be implemented in...
A generalized arithmetic package allows a programmer to think of numbers as abstract quantities, without regard to whether they are represented as integers, floating point values, fractions, complex numbers, or even polynomials. In this paper we describe two implementation techniques that can be used to produce a generalized arithmetic package...
This paper describes features of a new strongly typed, compiled programming language, called LEDA. LEDA attempts to combine aspects of both imperative (value-oriented) and logical (relation-oriented) styles of programming. Logical behavior is introduced by means of a new programming structure, called a relation. Relations have similarities to functions and procedures,...
Functional programming languages, such as Backus' FP, and high level expression oriented languages, such as APL, are examples of programming languages in which the primary method of program construction is the process of composition. In this paper we describe an approach to generating code for languages based on compositions. The...
In an earlier paper we developed an intermediate representation for languages based on composition, and showed how the representation could facilitate generating code for functional languages, such as FP. In this paper we follow the same philosophical approach, using instead the applicative language APL. Further, we show how this intermediate...
Object oriented programming languages are noted for their ability to allow users to quickly construct large software systems. They achieve this ability by allowing the programmer to concentrate on what it is they want to do, ignoring details of how that functionality is achieved. Such characteristics should make the object...
This paper addresses the problem of interactively modeling large street networks. We introduce a modeling framework that uses tensor fields to guide the generation of a street graph. A user can interactively edit a street graph by either modifying the underlying tensor field or by changing the graph directly. This...
Fluid simulation on interacting deformable surfaces is a challenging problem that has many applications. In this paper, we present a framework in which artistic as well as physically realistic flows can be generated on surfaces during deformation and collision. Our simulation system provides comprehensive control over the motion and deformation...
This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles...
We introduce a visual specification language for spreadsheets that allows the definition of spreadsheet templates. These templates are used by a spreadsheet generator to create Excel spreadsheets that are probably free from a large class of errors, such as reference, omission, and type errors. We demonstrate how spreadsheets can be...
A huge discrepancy between theory and practice exists in one popular application area of functional programming--spreadsheets. Although spreadsheets are the most frequently used (functional) programs, few formal models of computation and type systems exist that would provide the foundation for creating reliable spreadsheets. Consequently, existing spreadsheets contain many errors, some...
The field of machine learning has made major strides over the last 20 years. This document summarizes the major problem formulations that the discipline has studied, then reviews three tasks in cognitive networking and briefly discusses how aspects of those tasks fit these formulations. After this, it discusses challenges for...
This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decision-making actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected...
The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second, auxiliary, source of data is available drawn from a different distribution. This auxiliary data is more plentiful, but of significantly lower quality,...
Attracting, educating and retaining new engineering students is a challenge. The creative aspirations and "can do" attitude spawned by the space race, Heathkits, and homemade crystal radios have been replaced with the passive satisfaction of video games, cell phones and throwaway electronic appliances. At Oregon State University we have made...
In this paper, based on coding theory concepts, new time scheduling algorithms for multihop packet radio networks are described. Each mobile host is assigned a word from an appropriate constant weight code of length n, distance d and weight w. The host can send a message at the j[superscipt th]...
A fault-tolerant routing method that can tolerate solid faults using only two virtual channels is presented. The proposed routing algorithm not only uses a fewer number of virtual channels but also tolerates f-chains in the meshes. It is shown that the proposed algorithm is deadlock-free and livelock-free in meshes when...
The language G is an expression based language designed around the concept of the stream; where a stream is a sequence of values, perhaps infinite in length. There are no statements in G, only expressions. All expressions are evaluated on a demand driven basis, no computation takes place that does...
The WYSIWYT (What You See is What You Test) methodology applies formal analysis and testing techniques to the spreadsheet paradigm. So far the methodology has been applied to a research spreadsheet prototype, Forms/3. However, this prototype lacks the mathematical libraries, referential functions, ranges, and macros of commercial spreadsheets like Excel...
Although spreadsheets can be argued to be the most widely-used end-user programming languages today, they are very limited compared to other programming languages, supporting only a few built-in types and offering only primitive support for code reuse. The inheritance mechanisms of object-oriented programming might seem to offer help for the...
Although spreadsheets can be argued to be the most widely-used visual programming languages (VPLs) today, most are very limited compared to other VPLs, supporting only p few built-in types and offering only primitive support for code reuse. The inheritance mechanisms of object-oriented programming might seem to offer help for the...
How might capabilities for algorithm animation be seamlessly integrated into a programming language that is both visual and declarative? Until now, visual programming language researchers have not attempted to answer that question, making the fruits of algorithm animation available only to users of textual progranming languages. Users of visual programming...
This paper addresses cost-sensitive classification in the setting where there are costs for measuring each attribute as well as costs for misclassification errors. We show how to formulate this as a Markov Decision Process in which the transition model is learned from the training data. Specifically we assume a set...
A common heuristic for solving Partially Observable Markov Decision Problems POMDPs is to first solve the underlying Markov Decision Process MDP and then construct a POMDP policy by performing a fixed depth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function. A problem with...
This paper introduces the even-odd POMDP an approximation to POMDPs Partially Observable Markov Decision Problems in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent...
The structure of monadic functional programs allows the integration of many different features into such programs by just changing the definition of the monad and not the program, which is a desirable feature from a software engineering and maintenance point of view. We describe an algorithm for the automatic transformation...
This paper focuses on mining the strategies of problem solving software users by observing their actions. Our application domain is an HCI study aimed at discovering general strategies employed by software users and understanding how such strategies relate to gender and success. We cast this problem as a sequential pattern...
This paper presents an empirical approach for measuring and characterizing the responsiveness of a character to changes in goal. Our approach is based on keeping track of the character's progress towards a frequently changing goal. A "distance-to-goal" function is defined to measure the progress. We then calculate an asymptotic proportion...
Spreadsheet languages, which include commercial spreadsheets and various research systems, have proven to be flexible tools in many settings. Research shows, however, that spreadsheets often contain faults. This thesis presents an integrated testing and fault localization methodology for spreadsheets. This methodology allows spreadsheet developers to engage in modeless development, testing...
End-user programmers are writing an unprecedented number of programs, due in large part to the significant effort put forth to bring programming power to end users. Unfortunately, this effort has not been supplemented by a comparable effort to increase the correctness of these often faulty programs. To address this need,...
Approximate string matching is commonly used to align genetic sequences (DNA or RNA) to determine their shared
characteristics. Most genetic string matching methods are based on the edit-distance model, which does not provide
alignments for inversions and translocations. Recently, a heuristic called the Walking Tree Method [2, 3] has been...
We consider learning tree patterns from queries extending
our preceding work [Amoth, Cull, & Tadepalli,
1998]. The instances in this paper are unordered
trees with nodes labeled by constant identifiers.
The concepts are tree patterns and unions
of tree patterns (unordered forests) with leaves labeled
with constants or variables. A...
Many machine learning applications require classifiers that minimize an asymmetric loss function rather than the raw misclassification rate. We study methods for modifying C4.5 to incorporate
arbitrary loss matrices. One way to incorporate loss information
into C4.5 is to manipulate the weights assigned to the examples
from different classes. For...
Forms/3 is a declarative visual programming language that aims to provide general purpose programming language capabilities in a simple, form-based approach. This report provides an example-driven introduction to programming in Forms/3.
This paper surveys sequential and parallel implementation techniques for functional programming languages, as well as optimizations that can improve their performance. Sequential implementations have evolved from simple interpreters to sophisticated super-combinator-based compilers, while most parallel implementations have explored a broad range of techniques. We analyze the purpose and function of...
Multiparadigm programming languages have been envisioned as a vehicle for constructing large and complex heterogeneous systems, such as a stock market exchange or a telecommunications network. General-purpose multiparadigm languages, as opposed to hybrid multiparadigm languages, embody several prevalent programming paradigms without being motivated by a single problem. One such language...
Augmented term rewriting (ATR) is a simple, uniform, and extensible computational model for constraint programming. Unfortunately, ATR cannot solve combinatorial constraint satisfaction problems (CCSPs). To enable solution of CCSPs, we introduce (don't know) nondeterminism into ATR via the choice expression, which identifies a set of possible values that may satisfy...
In this paper, we introduce a model-based reinforcement learning method called H-learning, which optimizes undiscounted average reward. We compare it with three other reinforcement learning methods in the domain of scheduling Automatic Guided Vehicles, transportation robots used in modern manufacturing plants and facilities. The four methods differ along two dimensions....
The N-Queens problem is commonly used to teach the programming technique of backtrack search. The N-Queens problem may also be used to illustrate the important concept of isomorphism. Here we show how the N-Queens problem can be used as a vehicle to teach the concepts of isomorphism, transformation groups or...
CLEDA is a new programming language descended from the multiparadigm, strongly typed, compiled programming language LEDA. In addition to the four paradigms supported by LEDA, which are imperative, functional, object-oriented, and relational, CLEDA supports the constraint logic programming paradigm. CLEDA is intended to be used to write applications that involve...
"This report describes the revised definition of the multiparadigm programming language Leda. The first section provides an introduction to Leda and a history of its development. Section 2 covers Leda preliminaries, section 3 details the overall structure of Leda programs. Declarations are discussed in section 4, expressions are the topic...
ExcelForms is a front end Excel-based application that supports Forms/3, a research
language based on the spreadsheet paradigm, end-user software engineering features.
The old implementation of ExcelForms performed poorly, and was considered
unstable, not robust, and not scalable enough for our users' needs. This project
addresses these issues by implementing...
A human-centric issue that has not been considered in
the design of end-user programming environments is
whether gender differences exist that are important to the
design of these environments. Ignoring this issue would
miss the opportunity of enhancing the effectiveness of
end-user programmers by incorporating appropriate
mechanisms to support gender-associated...
End users develop more software than any other group of programmers, using software authoring devices such as email filtering editors, by-demonstration macro builders, and spreadsheet environments. Despite this, there has been only a little research on finding ways to help these programmers with the dependability of the software they create....
How can rigorous forms of testing be supported in a way that is both compatible with the visual aspect of visual programming languages, and usable by the audiences using those languages -- even when the audience has no background in software engineering? Visual programs are likely to contain at least...
Can collaborative filtering be successfully applied to digital
libraries in a manner that improves the effectiveness of the
library? Collaborative filtering systems remove the limitation of
traditional content-based search interfaces by using individuals to
evaluate and recommend information. We introduce an approach
where a digital library user specifies their need...
We describe the design of a domain-specific language (DSL) for the specification of generic ocean modeling tools, and we describe the
implementation of its compiler. The goal of the DSL is to allow the specification of widely usable tools for ocean modeling once, and to allow its translation into different...
Many software maintenance problems are caused by using text editors to change programs. A more systematic and reliable way of performing program updates is to express changes with an update language. In particular, updates should preserve the syntax- and type-correctness of the transformed object programs. We describe an update calculus...
The meaning of biological sequences is a central problem
of modern biology. Although string matching is well understood
in the edit-distance model, biological strings
with transpositions and inversions violate this model's
assumptions. To align biologically reasonable strings, we
proposed the Walking Tree Method [4,5,6,7,8], an
approximate string alignment method that...
It is well known that the Fibonacci numbers can be expressed in the
form Round {1/√5 λ₀[superscript n]} where λ₀ = (1 + √5)/2. [Knu75] We look at integer sequences which are solutions to non-negative difference equations and show that if the equation is 1-Bounded then the solution can be...
We present a meta-model as a process maturity framework that can be effectively used to
identify the key practices to initiate and sustain a software process improvement effort
focused on a single process area. Our approach moves away from the overall software
development process and computation of maturity levels and...
Many machine learning applications require
classifiers that minimize an asymmetric cost
function rather than the misclassification
rate, and several recent papers have addressed
this problem. However, these papers
have either applied no statistical testing
or have applied statistical methods that are
not appropriate for the cost-sensitive setting.
Without good statistical...
This paper introduces the even-odd POMDP, an approximation to POMDPs in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converted into an equivalent MDP, the
2MDP, whose value function, V*[subscript 2MDP], can be combined online with a 2-step lookahead search...
One dimensional nonlinear difference equations have been used to model population growth. The standard biological models have the interesting characteristic that
they display global stability if they display local stability. Various researchers have sought
a simple explanation for this agreement of local and global stability. Here, we show that
enveloping...
Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids
(proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and
aligning strings brings out their shared characteristics. Although string matching is well-understood in the
edit-distance model, biological strings...
The set of codewords for a standard error-correcting code can be viewed
as subset of the vertices of a hypercube. Two vertices are adjacent in a hypercube exactly when their Hamming distance is 1. A code is a perfect-error-correcting code if no two codewords are adjacent and every non-codeword is...
Existing topology-based vector field analysis techniques rely on the ability to extract the individual trajectories such as fixed points, periodic orbits and separatrices which are sensitive to noise and errors introduced by simulation and interpolation. This can make such vector field analysis unsuitable for rigorous interpretations. We advocate the use...
Vector field analysis plays a crucial role in many engineering applications, such as weather prediction, tsunami and hurricane study, and airplane and automotive design. Existing vector field analysis techniques focus on individual trajectories such as fixed points, periodic orbits and separatrices which are sensitive to noise and errors introduced by...
If basic assumptions about how knowledge workers conceptualize and use work units are wrong, then any solutions resting on those assumptions are unlikely to be successful since, instead of decreasing costs, they will lead to increasing them. This paper reports on how knowledge workers understand, use and switch between units...
End users create software when they use spreadsheet systems, web authoring tools and graphical languages, and when they create educational simulations, macros-by-demonstration, and dynamic e-business web applications and mash-ups. Some end-user developers, such as accountants or teachers, may have no formal training at all in programming. Others, such as scientists...
Parallel languages rarely specify parallel I/O constructs, and existing commercial systems provide the programmer with a low-level I/O interface. We present design principles for integrating I/O into languages and show how these principles are applied to a virtual-processor-oriented language. We show how machine-independent modes are used to support both high...
An important issue in engineering education is the availability of laboratory resources for student use. Using a computer network to link geographically distant students with laboratory teaching resources makes expensive and innovative equipment available to more students. At Oregon State University, we provide a working environment where remotely-located students can...
Although there have been a number of studies of
end-user software development tasks, few of them have
considered gender issues for real end-user developers
in real-world environments for end-user programming.
In order to be trusted, the results of such laboratory
studies must always be re-evaluated with fewer controls,
more closely...
"Code coverage visualizations using block coverage neither guided developers
toward productive testing strategies, nor did these visualizations motivate
developers to write more tests or help them find more faults than the control
group. Nevertheless, code coverage visualizations did influence developers in a
few important ways. Code coverage visualizations led developers...
This paper presents qualitative results from interviews with knowledge workers about their recovery strategies after interruptions. Special focus is given to when these strategies fail due to the nature of the interruption and existing computer support. Potential solutions offered by participants to overcome some of these problems are presented. These...
This paper reports on an empirical study involving end users that addresses the question of whether it is possible to provide the benefits of formal testing within the informal spreadsheet paradigm. We have developed a "What You See Is What You Test" (WYSIWYT) methodology that brings the benefits of formal...
Knowledge of the pelagic organisms in vast areas of the open ocean is very limited. This is particularly true of the small nekton or swimming forms such as fishes, squid, prawns and euphausiids, which are important as intermediate animals in the food chain and are preyed upon by species such...
We present velocity observations from a shipboard acoustic Doppler current profiler (ADCP) on
R/V Wecoma during cruise W0301b (19 January - 3 February 2003). The cruise was a component
(Survey III) of the Coastal Ocean Advances in Shelf Transport (COAST) experiment. The
ADCP was an RD Instruments hull-mounted 153-kHz narrowband...
We present velocity observations from a shipboard acoustic Doppler current profiler (ADCP) on
R/V Wecoma during cruise W0301b (19 January - 3 February 2003). The cruise was a component
(Survey III) of the Coastal Ocean Advances in Shelf Transport (COAST) experiment. The
ADCP was an RD Instruments hull-mounted 153-kHz narrowband...
Full Text:
February 2003
S. D. Pierce
J. A. Barth
College of Oceanic and Atmospheric Sciences
OregonState