Summary: Four proposed metrics:
[1] average relative reduction in training time (sample size, number of training experiences)
[2] jumpstart (initial advantage of transfer algorithm)
[3] handicap (how long it takes the no-transfer algorithm to overcome the jumpstart)
[4] asymptotic advantage (how much better the transfer learning algorithm does in the...
The potential for machine learning systems to improve via a mutually beneficial exchange of information with users has yet to be explored in much detail. Previously, we found that users were willing to provide a generous amount of rich feedback to machine learning systems, and that the types of some...
Although machine learning is becoming commonly used in today's software, there has been little research into how end users might interact with machine learning systems, beyond communicating simple "right/wrong" judgments. If the users themselves could somehow work hand-in-hand with machine learning systems, the accuracy of learning systems could be improved...
Existing topology-based vector field analysis techniques rely on the ability to extract the individual trajectories such as fixed points, periodic orbits and separatrices which are sensitive to noise and errors introduced by simulation and interpolation. This can make such vector field analysis unsuitable for rigorous interpretations. We advocate the use...
This paper addresses the problem of interactively modeling large street networks. We introduce a modeling framework that uses tensor fields to guide the generation of a street graph. A user can interactively edit a street graph by either modifying the underlying tensor field or by changing the graph directly. This...
Vector field analysis plays a crucial role in many engineering applications, such as weather prediction, tsunami and hurricane study, and airplane and automotive design. Existing vector field analysis techniques focus on individual trajectories such as fixed points, periodic orbits and separatrices which are sensitive to noise and errors introduced by...
Fluid simulation on interacting deformable surfaces is a challenging problem that has many applications. In this paper, we present a framework in which artistic as well as physically realistic flows can be generated on surfaces during deformation and collision. Our simulation system provides comprehensive control over the motion and deformation...
If basic assumptions about how knowledge workers conceptualize and use work units are wrong, then any solutions resting on those assumptions are unlikely to be successful since, instead of decreasing costs, they will lead to increasing them. This paper reports on how knowledge workers understand, use and switch between units...
This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles...
We introduce a visual specification language for spreadsheets that allows the definition of spreadsheet templates. These templates are used by a spreadsheet generator to create Excel spreadsheets that are probably free from a large class of errors, such as reference, omission, and type errors. We demonstrate how spreadsheets can be...
Collaborative filtering (CF) algorithms are used in a wide range of internet applications. However the chief objective of using CF algorithms across most of these applications is to discover items that might be of interest to its users. CF algorithms work by obtaining feedback from users on the items that...
A huge discrepancy between theory and practice exists in one popular application area of functional programming--spreadsheets. Although spreadsheets are the most frequently used (functional) programs, few formal models of computation and type systems exist that would provide the foundation for creating reliable spreadsheets. Consequently, existing spreadsheets contain many errors, some...
This paper surveys the activity recognition task from a machine learning perspective. I give a definition of this problem, and I classify different activity recognition problems into two categories. I show the activities can be hierarchical, and based on such hierarchies I synthesize a language to describe activities. I give...
The field of machine learning has made major strides over the last 20 years. This document summarizes the major problem formulations that the discipline has studied, then reviews three tasks in cognitive networking and briefly discusses how aspects of those tasks fit these formulations. After this, it discusses challenges for...
This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decision-making actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected...
The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second, auxiliary, source of data is available drawn from a different distribution. This auxiliary data is more plentiful, but of significantly lower quality,...
Attracting, educating and retaining new engineering students is a challenge. The creative aspirations and "can do" attitude spawned by the space race, Heathkits, and homemade crystal radios have been replaced with the passive satisfaction of video games, cell phones and throwaway electronic appliances. At Oregon State University we have made...
In this paper, based on coding theory concepts, new time scheduling algorithms for multihop packet radio networks are described. Each mobile host is assigned a word from an appropriate constant weight code of length n, distance d and weight w. The host can send a message at the j[superscipt th]...
A fault-tolerant routing method that can tolerate solid faults using only two virtual channels is presented. The proposed routing algorithm not only uses a fewer number of virtual channels but also tolerates f-chains in the meshes. It is shown that the proposed algorithm is deadlock-free and livelock-free in meshes when...
The WYSIWYT (What You See is What You Test) methodology applies formal analysis and testing techniques to the spreadsheet paradigm. So far the methodology has been applied to a research spreadsheet prototype, Forms/3. However, this prototype lacks the mathematical libraries, referential functions, ranges, and macros of commercial spreadsheets like Excel...
Although spreadsheets can be argued to be the most widely-used end-user programming languages today, they are very limited compared to other programming languages, supporting only a few built-in types and offering only primitive support for code reuse. The inheritance mechanisms of object-oriented programming might seem to offer help for the...
Although spreadsheets can be argued to be the most widely-used visual programming languages (VPLs) today, most are very limited compared to other VPLs, supporting only p few built-in types and offering only primitive support for code reuse. The inheritance mechanisms of object-oriented programming might seem to offer help for the...
How might capabilities for algorithm animation be seamlessly integrated into a programming language that is both visual and declarative? Until now, visual programming language researchers have not attempted to answer that question, making the fruits of algorithm animation available only to users of textual progranming languages. Users of visual programming...
A diagnostic policy species what test to perform next based on the results of previous tests and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the costs of tests and (b) the costs of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost....
This paper addresses cost-sensitive classification in the setting where there are costs for measuring each attribute as well as costs for misclassification errors. We show how to formulate this as a Markov Decision Process in which the transition model is learned from the training data. Specifically we assume a set...
A common heuristic for solving Partially Observable Markov Decision Problems POMDPs is to first solve the underlying Markov Decision Process MDP and then construct a POMDP policy by performing a fixed depth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function. A problem with...
This paper introduces the even-odd POMDP an approximation to POMDPs Partially Observable Markov Decision Problems in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent...
What is the relationship between learning and reasoning? Much recent work in machine learning has been criticized for focusing on learning and ignoring reasoning. This paper attempts to describe the various ways in which machine learning research has (and has not) incorporated reasoning. The paper argues that there are important...
The structure of monadic functional programs allows the integration of many different features into such programs by just changing the definition of the monad and not the program, which is a desirable feature from a software engineering and maintenance point of view. We describe an algorithm for the automatic transformation...
This paper focuses on mining the strategies of problem solving software users by observing their actions. Our application domain is an HCI study aimed at discovering general strategies employed by software users and understanding how such strategies relate to gender and success. We cast this problem as a sequential pattern...
End users create software when they use spreadsheet systems, web authoring tools and graphical languages, and when they create educational simulations, macros-by-demonstration, and dynamic e-business web applications and mash-ups. Some end-user developers, such as accountants or teachers, may have no formal training at all in programming. Others, such as scientists...
There has been little research into how end-user programming environments can provide explanations that could fill a critical information gap for end-user debuggers - help with debugging strategy. To address this need, we designed and prototyped a video-based approach for explaining debugging strategy, and accompanied it with a text-only approach....
This paper presents an empirical approach for measuring and characterizing the responsiveness of a character to changes in goal. Our approach is based on keeping track of the character's progress towards a frequently changing goal. A "distance-to-goal" function is defined to measure the progress. We then calculate an asymptotic proportion...
Spreadsheet languages, which include commercial spreadsheets and various research systems, have proven to be flexible tools in many settings. Research shows, however, that spreadsheets often contain faults. This thesis presents an integrated testing and fault localization methodology for spreadsheets. This methodology allows spreadsheet developers to engage in modeless development, testing...
End-user programmers are writing an unprecedented number of programs, due in large part to the significant effort put forth to bring programming power to end users. Unfortunately, this effort has not been supplemented by a comparable effort to increase the correctness of these often faulty programs. To address this need,...
Approximate string matching is commonly used to align genetic sequences (DNA or RNA) to determine their shared
characteristics. Most genetic string matching methods are based on the edit-distance model, which does not provide
alignments for inversions and translocations. Recently, a heuristic called the Walking Tree Method [2, 3] has been...
We consider learning tree patterns from queries extending
our preceding work [Amoth, Cull, & Tadepalli,
1998]. The instances in this paper are unordered
trees with nodes labeled by constant identifiers.
The concepts are tree patterns and unions
of tree patterns (unordered forests) with leaves labeled
with constants or variables. A...
Many machine learning applications require classifiers that minimize an asymmetric loss function rather than the raw misclassification rate. We study methods for modifying C4.5 to incorporate
arbitrary loss matrices. One way to incorporate loss information
into C4.5 is to manipulate the weights assigned to the examples
from different classes. For...
"The specifi c problem addressed in this proposal is the development of
good approximation algorithms for solving problems that have partial observability. The model we propose associates costs with obtaining information about the current state. We want to predict when and how much it is necessary to observe. We want...
"Fifty years after the pioneering work of McCulloch and Pitts, the study of neural nets is alive and active. In this paper, I have discussed some of the work that is of current interest to me and my co-workers. I would, perhaps, be remiss if I failed to mention some...
Parallel languages rarely specify parallel I/O constructs, and existing commercial systems provide the programmer with a low-level I/O interface. We present design principles for integrating I/O into languages and show how these principles are applied to a virtual-processor-oriented language. We show how machine-independent modes are used to support both high...
Despite years of research into human computer interaction (HCI), the environments programmers must use for problem-solving today—with separate modes and tools for writing, compiling, testing, visualizing, and debugging— derive their basic structure from historical accident, and take little advantage of HCI research into the cognitive issues of programming. Neglecting these...
An important issue in engineering education is the availability of laboratory resources for student use. Using a computer network to link geographically distant students with laboratory teaching resources makes expensive and innovative equipment available to more students. At Oregon State University, we provide a working environment where remotely-located students can...
Forms/3 is a declarative visual programming language that aims to provide general purpose programming language capabilities in a simple, form-based approach. This report provides an example-driven introduction to programming in Forms/3.
This paper surveys sequential and parallel implementation techniques for functional programming languages, as well as optimizations that can improve their performance. Sequential implementations have evolved from simple interpreters to sophisticated super-combinator-based compilers, while most parallel implementations have explored a broad range of techniques. We analyze the purpose and function of...
Multiparadigm programming languages have been envisioned as a vehicle for constructing large and complex heterogeneous systems, such as a stock market exchange or a telecommunications network. General-purpose multiparadigm languages, as opposed to hybrid multiparadigm languages, embody several prevalent programming paradigms without being motivated by a single problem. One such language...
Brief descriptions of the I/O requirements for four production oceanography programs running at Oregon State University are presented. The applications all rely exclusively on array-oriented, sequential file operations. Persistent files are used for checkpointing and movie making, while temporary files are used to store out-of-core data.
This note briefly discusses some of the classical results of McCulloch and Pitts. It then deals with some current research in neural nets. Several questions about neural nets are shown to be computationally difficult by showing that they are NP-Complete or worse. The size of neural nets necessary to compute...