This paper introduces the even-odd POMDP, an approximation to POMDPs in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converted into an equivalent MDP, the
2MDP, whose value function, V*[subscript 2MDP], can be combined online with a 2-step lookahead search...
Many machine learning applications require
classifiers that minimize an asymmetric cost
function rather than the misclassification
rate, and several recent papers have addressed
this problem. However, these papers
have either applied no statistical testing
or have applied statistical methods that are
not appropriate for the cost-sensitive setting.
Without good statistical...
This paper reports on an empirical study involving end users that addresses the question of whether it is possible to provide the benefits of formal testing within the informal spreadsheet paradigm. We have developed a "What You See Is What You Test" (WYSIWYT) methodology that brings the benefits of formal...
In 1995 my students and I developed Leda, a multiparadigm language based on the Pascal model. Leda allowed programmers to create abstractions in an object-oriented, functional, or logic programming style. More recently we have been interested in recreating this work, but this time using Java as the language basis. The...
We present a meta-model as a process maturity framework that can be effectively used to
identify the key practices to initiate and sustain a software process improvement effort
focused on a single process area. Our approach moves away from the overall software
development process and computation of maturity levels and...
The meaning of biological sequences is a central problem
of modern biology. Although string matching is well understood
in the edit-distance model, biological strings
with transpositions and inversions violate this model's
assumptions. To align biologically reasonable strings, we
proposed the Walking Tree Method [4,5,6,7,8], an
approximate string alignment method that...
A fault-tolerant routing method that can tolerate solid faults using only two virtual channels is presented. The proposed routing algorithm not only uses a fewer number of virtual channels but also tolerates f-chains in the meshes. It is shown that the proposed algorithm is deadlock-free and livelock-free in meshes when...
In this paper, based on coding theory concepts, new time scheduling algorithms for multihop packet radio networks are described. Each mobile host is assigned a word from an appropriate constant weight code of length n, distance d and weight w. The host can send a message at the j[superscipt th]...
It is well known that the Fibonacci numbers can be expressed in the
form Round {1/√5 λ₀[superscript n]} where λ₀ = (1 + √5)/2. [Knu75] We look at integer sequences which are solutions to non-negative difference equations and show that if the equation is 1-Bounded then the solution can be...
The structure of monadic functional programs allows the integration of many different features into such programs by just changing the definition of the monad and not the program, which is a desirable feature from a software engineering and maintenance point of view. We describe an algorithm for the automatic transformation...
We describe the design of a domain-specific language (DSL) for the specification of generic ocean modeling tools, and we describe the
implementation of its compiler. The goal of the DSL is to allow the specification of widely usable tools for ocean modeling once, and to allow its translation into different...
Many software maintenance problems are caused by using text editors to change programs. A more systematic and reliable way of performing program updates is to express changes with an update language. In particular, updates should preserve the syntax- and type-correctness of the transformed object programs. We describe an update calculus...
Attracting, educating and retaining new engineering students is a challenge. The creative aspirations and "can do" attitude spawned by the space race, Heathkits, and homemade crystal radios have been replaced with the passive satisfaction of video games, cell phones and throwaway electronic appliances. At Oregon State University we have made...
Can collaborative filtering be successfully applied to digital
libraries in a manner that improves the effectiveness of the
library? Collaborative filtering systems remove the limitation of
traditional content-based search interfaces by using individuals to
evaluate and recommend information. We introduce an approach
where a digital library user specifies their need...
How can rigorous forms of testing be supported in a way that is both compatible with the visual aspect of visual programming languages, and usable by the audiences using those languages -- even when the audience has no background in software engineering? Visual programs are likely to contain at least...
The field of machine learning has made major strides over the last 20 years. This document summarizes the major problem formulations that the discipline has studied, then reviews three tasks in cognitive networking and briefly discusses how aspects of those tasks fit these formulations. After this, it discusses challenges for...
What is the relationship between learning and reasoning? Much recent work in machine learning has been criticized for focusing on learning and ignoring reasoning. This paper attempts to describe the various ways in which machine learning research has (and has not) incorporated reasoning. The paper argues that there are important...
This paper surveys the activity recognition task from a machine learning perspective. I give a definition of this problem, and I classify different activity recognition problems into two categories. I show the activities can be hierarchical, and based on such hierarchies I synthesize a language to describe activities. I give...
This paper addresses cost-sensitive classification in the setting where there are costs for measuring each attribute as well as costs for misclassification errors. We show how to formulate this as a Markov Decision Process in which the transition model is learned from the training data. Specifically we assume a set...
This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decision-making actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected...
A common heuristic for solving Partially Observable Markov Decision Problems POMDPs is to first solve the underlying Markov Decision Process MDP and then construct a POMDP policy by performing a fixed depth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function. A problem with...
End-user programmers are writing an unprecedented number of programs, due in large part to the significant effort put forth to bring programming power to end users. Unfortunately, this effort has not been supplemented by a comparable effort to increase the correctness of these often faulty programs. To address this need,...
A diagnostic policy species what test to perform next based on the results of previous tests and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the costs of tests and (b) the costs of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost....
The standard model of supervised learning assumes that training and test data are drawn from the same underlying distribution. This paper explores an application in which a second, auxiliary, source of data is available drawn from a different distribution. This auxiliary data is more plentiful, but of significantly lower quality,...
Spreadsheet languages, which include commercial spreadsheets and various research systems, have proven to be flexible tools in many settings. Research shows, however, that spreadsheets often contain faults. This thesis presents an integrated testing and fault localization methodology for spreadsheets. This methodology allows spreadsheet developers to engage in modeless development, testing...
End-user programmers are writing an unprecedented number of programs, primarily using languages and environments that incorporate a number of interactive and visual programming techniques. To help these users debug these programs, we have developed an entirely visual, interactive approach to fault localization. This paper presents the approach. We also present...
A human-centric issue that has not been considered in
the design of end-user programming environments is
whether gender differences exist that are important to the
design of these environments. Ignoring this issue would
miss the opportunity of enhancing the effectiveness of
end-user programmers by incorporating appropriate
mechanisms to support gender-associated...
End users develop more software than any other group of programmers, using software authoring devices such as email filtering editors, by-demonstration macro builders, and spreadsheet environments. Despite this, there has been only a little research on finding ways to help these programmers with the dependability of the software they create....
This paper introduces the even-odd POMDP an approximation to POMDPs Partially Observable Markov Decision Problems in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent...
The WYSIWYT (What You See is What You Test) methodology applies formal analysis and testing techniques to the spreadsheet paradigm. So far the methodology has been applied to a research spreadsheet prototype, Forms/3. However, this prototype lacks the mathematical libraries, referential functions, ranges, and macros of commercial spreadsheets like Excel...
Although gender differences in a technological world are receiving significant research attention, much of the research and practice has aimed at how society and education can impact the successes and retention of female computer science professionals—but the possibility of gender issues within software has received almost no attention. If gender...
A huge discrepancy between theory and practice exists in one popular application area of functional programming--spreadsheets. Although spreadsheets are the most frequently used (functional) programs, few formal models of computation and type systems exist that would provide the foundation for creating reliable spreadsheets. Consequently, existing spreadsheets contain many errors, some...
This paper presents an empirical approach for measuring and characterizing the responsiveness of a character to changes in goal. Our approach is based on keeping track of the character's progress towards a frequently changing goal. A "distance-to-goal" function is defined to measure the progress. We then calculate an asymptotic proportion...
Collaborative filtering (CF) algorithms are used in a wide range of internet applications. However the chief objective of using CF algorithms across most of these applications is to discover items that might be of interest to its users. CF algorithms work by obtaining feedback from users on the items that...
This paper presents qualitative results from interviews with knowledge workers about their recovery strategies after interruptions. Special focus is given to when these strategies fail due to the nature of the interruption and existing computer support. Potential solutions offered by participants to overcome some of these problems are presented. These...
We introduce a visual specification language for spreadsheets that allows the definition of spreadsheet templates. These templates are used by a spreadsheet generator to create Excel spreadsheets that are probably free from a large class of errors, such as reference, omission, and type errors. We demonstrate how spreadsheets can be...
"Code coverage visualizations using block coverage neither guided developers
toward productive testing strategies, nor did these visualizations motivate
developers to write more tests or help them find more faults than the control
group. Nevertheless, code coverage visualizations did influence developers in a
few important ways. Code coverage visualizations led developers...
This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles...
If basic assumptions about how knowledge workers conceptualize and use work units are wrong, then any solutions resting on those assumptions are unlikely to be successful since, instead of decreasing costs, they will lead to increasing them. This paper reports on how knowledge workers understand, use and switch between units...
Although researchers have begun to explicitly support end-user programmers' debugging by providing information to help them find bugs, there is little research addressing the right content to communicate to these users. The specific semantic content of these debugging communications matters because, if the users are not actually seeking the information...
Summary: Four proposed metrics:
[1] average relative reduction in training time (sample size, number of training experiences)
[2] jumpstart (initial advantage of transfer algorithm)
[3] handicap (how long it takes the no-transfer algorithm to overcome the jumpstart)
[4] asymptotic advantage (how much better the transfer learning algorithm does in the...
ExcelForms is a front end Excel-based application that supports Forms/3, a research
language based on the spreadsheet paradigm, end-user software engineering features.
The old implementation of ExcelForms performed poorly, and was considered
unstable, not robust, and not scalable enough for our users' needs. This project
addresses these issues by implementing...
This paper focuses on mining the strategies of problem solving software users by observing their actions. Our application domain is an HCI study aimed at discovering general strategies employed by software users and understanding how such strategies relate to gender and success. We cast this problem as a sequential pattern...
Vector field analysis plays a crucial role in many engineering applications, such as weather prediction, tsunami and hurricane study, and airplane and automotive design. Existing vector field analysis techniques focus on individual trajectories such as fixed points, periodic orbits and separatrices which are sensitive to noise and errors introduced by...
Although machine learning is becoming commonly used in today's software, there has been little research into how end users might interact with machine learning systems, beyond communicating simple "right/wrong" judgments. If the users themselves could somehow work hand-in-hand with machine learning systems, the accuracy of learning systems could be improved...
The potential for machine learning systems to improve via a mutually beneficial exchange of information with users has yet to be explored in much detail. Previously, we found that users were willing to provide a generous amount of rich feedback to machine learning systems, and that the types of some...
This paper addresses the problem of interactively modeling large street networks. We introduce a modeling framework that uses tensor fields to guide the generation of a street graph. A user can interactively edit a street graph by either modifying the underlying tensor field or by changing the graph directly. This...
Existing topology-based vector field analysis techniques rely on the ability to extract the individual trajectories such as fixed points, periodic orbits and separatrices which are sensitive to noise and errors introduced by simulation and interpolation. This can make such vector field analysis unsuitable for rigorous interpretations. We advocate the use...