Virtual environments and simulations are being used increasingly to both visualize and understand data as well as to create scenarios for training and analysis purposes. In this paper, we are interested in the use of simulation and visualization of interactive virtual agents to create realistic motions for training scenarios. We...
Linear transformation for dimension reduction is a well established problem in the field of machine learning. Due to the numerous observability of parameters and data, processing of the data in its raw form is computationally complex and difficult to visualize. Dimension reduction by means of feature extraction offers a strong...
The results of a machine learning from user behavior can be thought of as a program, and like all programs, it may need to be debugged. Providing ways for the user to debug it matters because without the ability to fix errors, users may find that the learned program’s errors...
This thesis presents a progression of novel planning algorithms that culminates in a new family of diverse Monte-Carlo methods for probabilistic planning domains. We provide a proof for performance guarantees and analyze how these algorithms can resolve some of the shortcomings of traditional probabilistic planning methods. The direct policy search...
End users' programs are fraught with errors, costing companies millions of dollars. One reason may be that researchers and tool designers have not yet focused on end-user debugging strategies. To investigate this possibility, this dissertation presents eight empirical studies and a new strategy-based end-user debugging tool for Excel, called StratCel....
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high
stochasticity or "outcome space" explosion. Multiagent domains are particularly susceptible to these problems. This thesis describes ways to mitigate these curses in several different multiagent domains, including real-time delivery of products...
Communicating dynamic motion content, such as exercise, with a static medium, such as paper, is difficult. The technology exists for presenting 3D animated exercise content to patients; however, the tools for allowing exercise domain experts to effectively author the content do not exist. We conducted two formative studies with exercise...
We took the back-propagation algorithms of Werbos for recurrent and feed-forward neural networks and implemented them on machines with graphics processing units (GPU). The parallelism of these units gave our implementations a 10 to 100 fold increase in speed. For nets with less than 20 neurons the machine performed faster...
This dissertation explores the idea of applying machine learning technologies to help computer users find information and better organize electronic resources, by presenting the research work conducted in the following three applications: FolderPredictor, Stacking Recommendation Engines, and Integrating Learning and Reasoning.
FolderPredictor is an intelligent desktop software tool that helps...
Sequential supervised learning problems arise in many real applications. This dissertation focuses on two important research directions in sequential supervised learning: efficient training and feature induction.
In the direction of efficient training, we study the training of conditional random fields (CRFs), which provide a flexible and powerful model for sequential...
This project aims at implementing Indexing for Web 2.0 Applications. Ajax applications consist of a set of states which are generated by the user through actions such as click, focus, blur etc. events. By saving these DOM states we can index information obtained from dynamically generated web content. To prevent...
Analysis, visualization, and design of vector fields on surfaces have a wide variety of major applications in both scientific visualization and computer graphics. On the one hand, analysis and visualization of vector fields provide critical insights to the flow data produced from simulation or experiments of various engineering processes. On...
Programmers spend a substantial fraction of their debugging time by navigating
through source code, yet little is known about how programmers navigate. With the
continuing growth in size and complexity of software, this fraction of time is likely to
increase, which presents challenges to those seeking both to understand and...
The problem of document classification has been widely studied in machine learning and data mining. In document classification, most of the popular algorithms are based on the bag-of-words representation. Due to the high dimensionality of the bag-of-words representation, significant research has been conducted to reduce the dimensionality via different approaches....
In the field of Human-Computer Interaction, provenance refers to the complete history and genealogy of a document. Provenance can be useful in identifying related resources, such as different versions of the same document or resources used in the creation of a new document. Though methods of provenance collection and applications...
An interdisciplinary study into the theory of design decisions has yielded a model for tracking design changes in hardware/software systems, but it still needs to be applied to a larger system to test its efficiency at tracking important data. This thesis creates an implementation of PLEXIL, a language in development...
The purpose of this project is to load test, and fine tune the loan search functionality of the Broker Blueprint web application, an innovative Business-to-Business (B2B) online service aiding mortgage lenders and brokers in today's highly competitive mortgage market.
Broker Blueprint enables brokers to search for suitable mortgage loans across...
Interconnection networks play important roles in designing high performance computers. Recently two new classes of interconnection networks based on the concept of Gaussian and Eisenstein-Jacobi integers were introduced. In this research, efficient routing and broadcasting algorithms for these networks are developed. Furthermore, constructing edge disjoint Hamiltonian cycles in Gaussian networks...
A financial processor is the most important component of a credit union‘s IT infrastructure. A database storing member demographic information, account balances, and transaction history, it performs financial calculations, such as interest, dividends, and maturities. It also provides a user interface, allowing tellers and financial service representatives to manage accounts...
This paper examines how six online multiclass text classification algorithms perform in the domain of email tagging within the TaskTracer system. TaskTracer is a project-oriented user interface for the desktop knowledge worker. TaskTracer attempts to tag all documents, web pages, and email messages with the projects to which they are...
Automatic painterly rendering systems have been proposed but they opted for selecting a single style to generate paintings from images, which lacks the ability of creatively using multiple styles to focus important objects and deemphasize unimportant part of the scenes. We provide a multi-style painting framework to
address this issue...
Open source software has become a powerful force in the world of computing. While once confined to the domain of technical specialists, people of all types have begun to adopt this software – from the casual web-surfer who uses Firefox, to the professional web developer who codes in PHP or...
Ensuring correctness of real-world software applications is a challenging task. Testing can be used to find many bugs, but is typically not sufficient for proving correctness or even eliminating entire classes of bugs. However, formal proof and verification techniques tend to be very heavy weight and are simply not available...
Knowledge workers are struggling in the information flood. There is a growing interest in intelligent desktop environments that help knowledge workers organize their daily life. Intelligent desktop environments allow the desktop user to define a set of “activities” that characterize the user’s desktop work. These environments then attempt to identify...
The push towards higher performing and more sensitive mixed signal circuitry has required the parallel development of increasingly more complex and sensitive test and calibration harnesses. Current off-chip methods of test and calibration may require higher pin counts or induce unwanted parasitic interference.
In this thesis, the design of a...
Markov models are commonly used for joint inference of label sequences. Unfortunately, inference scales quadratically in the number of labels, which is problematic for training methods where inference is repeatedly preformed and is the primary computational bottleneck for large label sets. Recent work has used output coding to address this...
We consider the problem of tactical assault planning in real-time strategy games where a team of friendly agents must launch an assault on an enemy. This problem offers many challenges including a highly dynamic and uncertain environment, multiple agents, durative actions, numeric attributes, and different optimization objectives. While the dynamics...
In this dissertation, we present a user-in-the-loop method for the design of an interactive motion data structure that benefits from the advantages of both motion graphs and blend-based techniques. Our novel approach automatically analyzes a traditional motion graph built from labeled motion clips. The result is a more condensed, coarser...
Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning...
With the increase in demand for streaming media capabilities across the Internet, the focus has shifted from traditional client-server to peer-to-peer approaches. Content Distribution Networks (CDNs) have also recently moved from web acceleration to media streaming. P2P CDNs can be used both as a delivery mechanism and as an independent...
Network coding is a transmission paradigm that is known to achieve better network throughput in certain multicast topologies; however, the practicality of network coding has been questioned due to its high computational complexity. One of the drawbacks of using network coding is the long decoding times, this is mainly due...
Motion capture data is a digital representation of the complex temporal structure of human motion. Motion capture is widely used for data-driven animation in sports,medicine and entertainment, because of its ability to capture complex and realistic
motions. Due to its efficiency and cost, methods for reusing collections of motion capture...
Transportation infrastructure provides a vital service for the functionality of a
city. The efficient design of road networks poses an interesting topic in computer
science for digital content developers. For civil engineers, the visualization of
analysis results on infrastructure both efficiently and intuitively is crucial. The
following contributions are made...
Recent efforts in user-control of data-driven characters have focused on designing high-level graph data-structures that we call a Behavior Finite State Machine (BFSM). A BFSM is an interactive data-structure that benefits from the advantages of both motion graphs and blend-based techniques for generating animated motion. Each node in a BFSM...
Motivation for cloud computing applications are listed. A Cloud Computing framework – MapReduce – is implemented. A document indexing application is built as an example MapReduce application on this framework. Focus is given to ease of job submission and scalability of the underlying network.
In diversity combining automatic repeat request (ARQ), erroneous packets are combined together forming a single, more reliable, packet. In this thesis, we give a diversity combining scheme for the m-ary unidirectional channel. A system using the given scheme with a t-unidirectional error detecting code is able to correct up to...
This thesis addresses the problem of learning dynamic Bayesian network (DBN) models to support reinforcement learning. It focuses on learning regression tree models of the conditional probability distributions of the DBNs. Existing algorithms presume that the stochasticity in the domain can be modeled as a deterministic function with additive noise....
Fluid simulation is an interesting research problem with a wide range of applications including mechanical engineering, special effects in movies and games, and scientific simulation. Due to the complex nature of typical fluid flow equations, there are circumstances where a full volumetric fluid simulation may not be necessary to generate...
Finding an efficient way of distributing content in Peer-to-Peer (P2P) networks has become important with the growing popularity of media streaming applications. Video multicast applications rely on the efficiency of content distribution from a single source to multiple receivers where one source streams a video to a large number of...
Factorization of integers is an important aspect of cryptography since it can be used as an
attack against some of the common cryptographic methods being used. There are
numerous methods in existence for factoring integers. Some of these are faster than
others for general numbers, while others work best on...
DiskGrapher is a graphical visualization tool designed to help users better manage the
space on their hard drives. The main goal of DiskGrapher is to provide a different
visualization technique to display information, with the goal of providing a more intuitive
understanding of the directory structure of the disk than...
Forward Error Correction and retransmission are two approaches used to reliably broadcast data in a network with poor quality of service. Taking some assumptions, it has been suggested that a retransmission based reliable broadcasting scheme using network coding should in theory provide an increase in bandwidth efficiency by combining packets...
Web applications are popular attack targets. Misuse detection systems use signature databases to detect known attacks. However, it is difficult to keep the database up to date with the rate of discovery of vulnerabilities. They also cannot detect zero-day attacks. By contrast, anomaly detection systems learn the normal behavior of...
Active participation and collaboration of community members are crucial to the continuation and expansion of open source software projects. Researchers have recognized the value of community in open source development and studied various aspects of it including structure of communities, motivations for participation, and collaboration among members. However, the majority...
WebGen is a software tool for generating Web scripts automatically for a Web-based database application. In this project, access control, AJAX support, and editable-and-insertable table mechanisms were added to WebGen. With our access control mechanism, an access-control level can be specified for each table. In access control level 1, for...
Building intelligent computer assistants has been a long-cherished goal of AI. Many intelligent assistant systems were built and fine-tuned to specific application domains. In this work, we develop a general model of assistance that combines three powerful ideas: decision theory, hierarchical task models and probabilistic relational languages. We use the...
Protecting end-users privacy and building trust are the two most important factors needed to support the growth of ecommerce. The increased dependence on the Internet for a wide variety of daily transactions causes a corresponding loss in privacy for many users, as virtually all websites collect data from users directly...
Remote sensors are becoming the standard for observing and recording ecological data in the field. Such sensors can record data at fine temporal resolutions, and they can operate under extreme conditions prohibitive to human access. Unfortunately, sensor data streams exhibit many kinds of errors ranging from corrupt communications to partial...
Nowadays, sports events are a significant part of the every-day entertainment with local, national, and international championships. A lot of money is invested by broadcasting companies to attract new and more viewers, acquire broadcasting rights, or send entire crews on site to cover such events. Journalists are among the few...
There has been little prior research reporting strategy usage in end-user problem solving, and even less using gender as a factor. Without this type of information, enduser programming systems cannot know the “target” at which to aim, if they are to support male and female end-user programmers’ debugging. As a...