The use of mobile devices is increasing and these devices run on batteries. Therefore, it becomes important to save power in these devices. To do so, we need to have a mechanism that estimates the power consumption in transmission as well as power used by the CPU while processing a...
Currently, a popular approach to image classification uses the deep Transformer architecture. In a Transformer, the attention mechanism enables the model to learn efficiently with fewer computational resources than the convolutional neural networks (CNNs). In this thesis, we study the sparse attention mechanism widely used in the Transformers developed specifically...
Visual programming languages employ visual representation to make programming easier and make programs more reliable and more accessible. Visual program testing becomes increasingly important as more and more visual programming languages and visual programming environments come into real use. In this work, we focus on one important class of visual...
The WEPP (Water Erosion Prediction Project) application computes soil loss and sediment yield from a field based on the data on crops, management practices, and operations. In order to make WEPP, which is a Windows-based application, easily accessible, Web WEPP (Web-based WEPP) was developed by our research group.
Web WEPP...
Web-Site Generator 3 (WebSiteGen3) is a rapid application development (RAD) tool that generates ASP.NET forms to insert, query, update and delete data stored in the user tables in a SQL Server 2000 database. WebSiteGen3 uses a graphical user interface to show the user tables in a hierarchical tree based on...
In this project we implemented WebSiteGen2, which is a software tool that automatically generates HTML pages and server-side scripts for a Web-based database application. A user of WebSiteGen2 can select the tables and columns for which HTML pages and server-side scripts are generated. The menus for this selection process are...
Revised Universal Soil Loss Equation (RUSLE) is a standard for estimating soil loss caused by rainfall and overland flow. The current software tool that implements this standard is RUSLE 1.06b, which is a stand-alone DOS application. In this project, we converted the DOS application to a Web application, which we...
WebGen 5 is a software tool for automatically generating Web scripts that display Web forms and operate on data in a database. WebGen 5 is implemented as a collection of templates. Each template, combined with a corresponding configuration file, generates one of the following six types of Web scripts: search,...
We have developed a framework for Web-based GIS/database applications which allow users to insert, update, delete, and query data with a map interface displayed by Web browsers. The framework was designed so that a Web-based GIS application that uses ArcIMS as a map server can be easily created, customized, and...
We have developed a framework for Web-based GIS/database (WebGD) applications
that allow users to insert, query, and delete data with map interfaces displayed by Web
browsers. The framework uses such open source software packages as Minnesota
MapServer, PostGIS, and PostgreSQL. With this framework, we can create the map
interface of...
WEPP (Water Erosion Prediction Project) is a stand-alone Windows application for predicting water erosion from overland flow on a hillslope. This application was developed by the National Soil Erosion Research Laboratory (NSERL). To make the WEPP application more accessible and easy to use, we created a web version of WEPP,...
Web-based Pesticide Screening Tool (Web-PST) is a web-based software application for evaluating the potential risk of pesticides on the surrounding environment. It uses the formulas and standards specified by the Soil/Pesticide Interaction Screening Procedure Version II (SPISP II). Web-PST closely models the stand-alone Windows application Windows Pesticide Screening Tool (WIN-PST)....
We have developed a web-based software tool for evaluating the potential risk of pesticides on the surrounding environment. The software tool is called Web-based Pesticide Screening Tool (Web-PST). It uses the formulas and standards specified by the Soil/Pesticide Interaction Screening Procedure Version II (SPISP II). Web-PST closely models the stand-alone...
We developed a Web-based GIS/database application designed to help motel businesses reach target customers effectively. By using an interactive map along with a relational database, users can view any particular area of the map and perform some basic operations as their groups privileges permit. With different types of privileges, users...
Deep learning and neural network has been widely used in research, deep learning has empowered many tasks such as point clouds segmentation and shape recognition. One of the main advantages of deep interaction point cloud segmentation is that it allows the feature extraction can be learned through neural network based...
GEM-GIS is a prototype of a web-based GIS/Database application for managing a germplasm collection. This application include a database, a map interface, a set of web forms for database access, and an analysis module. The analysis module perform statistical analysis for the accessions of a species selected by the user...
The thesis focuses on activity recognition from sensor data, which has spurred a great deal of interest due to its impact on health care and security. Previous work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort...
Algorithms and MP1 parallel C programs are developed for constructing wavelet expansions of long-range potential functions with O(n) time complexity. A new high frequency correction algorithm is introduced. The emphasis is on the common potential expansion encountered in physics that behaves as l/r in three-dimensions. The central B-Splines are used...
Approximate string matching is commonly used to align genetic sequences (DNA
or RNA) to determine their shared characteristics. In contrast with the standard
dynamic programming methods which use local edit distance models, the Walking
Tree heuristic method was created to handle non-local changes, e.g., translocations,
inversions, and duplications, altogether and...
Realistic (ideally photorealistic) real-time rendering has remained an elusive goal in computer graphics. While photorealistic rendering has certainly been achieved at the expense of tremendous computational resources and corresponding rendering times; real-time rendering typically must accept a great number of compromises to achieve adequate performance, such as aliasing artifacts, the...
Oftentimes in visualization, the goal of using volume datasets is not just to visualize them but also to analyze and compare them. In order to compare the two volumes, we cannot take all the voxels into consideration. The size of a typical volume data set is quite large (maybe a...
The types and rates of label noise in real-world data sets present a challenge to machine learning projects. In this thesis, we propose a novel approach to address this issue. Our method combines a noise modelling technique for correcting label noise across the entire data set with a robust loss...
3D datasets acquire great importance in the context of medical imaging. In this thesis we survey and enhance solutions to problems inherently associated with 3D datasets-processing time,noise and visualization. Efforts include development of a tool kit to provide a multi-threaded processing platform to cut processing time, produce real time visualization...
Uploading everyday information about food intake, sleep, number of steps and then generating consolidated peer visual reports for participants in large-scale health studies, often divided into multiple treatment groups, can be challenging.
This challenge is even bigger if subjects are young teenagers between the age of 14-19 active in sports,...
A cluster of computers can be used to render large amounts of data at very high resolutions. One application of clusters is scientific visualization, which often involves the display of large data sets. This paper describes a prototype system that allows oceanographers to interactively view oceanographic data using a cluster...
Graphics hardware in mobile devices has become more powerful, allowing rendering techniques such as ray-cast volume rendering to be done at interactive rates. This increase of performance provides desktop capabilities combined with the portability of a tablet. Volumes can demand a high amount of memory in order to be loaded...
Open Source software gives users the freedom to copy, modify and redistribute source code without legal entanglements. The evolution of these software communities usually depend a lot on how the participating developers and users interact and co-operate with each other. Over the past few years, open source software have become...
Biologists need tools to see the structural relationships encoded in biological sequences (strings). The Walking Tree heuristics calculate some of these relationships. I have designed and implemented graphic presentations which allow the biologist (user) to see these relations. This thesis contains background information on the biological sequences and some background...
N-ary relationships, which relate N entities where N is not necessarily two, are omnipresent in real life. In this thesis, we develop a visualization technique for N-ary relationships.
First, we propose a visual metaphor that utilizes vertices and polygons to represent entities and N-ary relationships. Based on this visual metaphor,...
Transportation infrastructure provides a vital service for the functionality of a
city. The efficient design of road networks poses an interesting topic in computer
science for digital content developers. For civil engineers, the visualization of
analysis results on infrastructure both efficiently and intuitively is crucial. The
following contributions are made...
Cold air pools are spatiotemporal phenomena that occur when cold air from higher elevations roll down the slope to accumulate in lower elevations. Behaviors like this lead to microclimate anomalies such as the city of Corvallis (Oregon) experiencing persistent cold weather even on a sunny day. We analyze multivariate temperature...
Structure Query Language (SQL) is widely used to access data stored in relational database systems. Although a powerful and flexible language, SQL can also be complex and hard to learn. For most new SQL users, it's easy to write SQL statement by following SQL grammar and syntax rules, but it's...
Learning to recognize objects is a fundamental and essential step in human perception and understanding of the world. Accordingly, research of object discovery across diverse modalities plays a pivotal role in the context of computer vision. This field not only contributes significantly to enhancing our understanding of visual information but...
While individual portfolio diversity analysis is a well-studied problem in visualization, the visual analysis of individual or groups of portfolios, over time, has received little attention. Such analysis, however, is important to researchers who are interested in better understanding portfolio management behavior of experts as well as novices. We conducted...
This report describes "VIGRAM" (Visual Programming) which is a program understanding and complexity metric analysis tool for Pascal programs. VIGRAM is implemented on the Macintosh as one part of the "O.S.U." (Qregon Speedcode Universe) project. With VIGRAM, the source code of a Pascal procedure can be displayed as a visual...
This paper discusses the merits of providing users variational views when editing variational code. I provide a plugin for the popular Atom Integrated Development Environment (IDE) which replaces #ifdef annotations commonly used by the C PreProcessor (CPP) with colored backgrounds, thus reducing code clutter and attempting to help programmers quickly...
In this dissertation we consider the problem of
automating the design of access structures for relational
database systems. The main considerations are effective
and rigorous utilization of the users' usage patterns,
global treatment of the whole design and utilizing most of
the commonly known access structures.
We represent the usage...
With the increase in demand for streaming media capabilities across the Internet, the focus has shifted from traditional client-server to peer-to-peer approaches. Content Distribution Networks (CDNs) have also recently moved from web acceleration to media streaming. P2P CDNs can be used both as a delivery mechanism and as an independent...
This report presents an efficient method for semi-supervised video object segmentation – the problem of identifying foreground pixels occupied by a target object. The target is specified by the ground-truth mask in the first video frame. While the state of the art achieves a segmentation accuracy greater than 80%, it...
This thesis consists of two major components. The first part is concerned with video object instance segmentation (VOS), which is the task of assigning per-pixel labels perframe of a video sequence to indicate foreground object instance membership, given the first frame ground truth mask. VOS has myriad applications, from video...
Software history and version control systems (VCS) are an important source of information for developers. This entails the need for a principled understanding of developers’ information seeking in VCS, both for improving existing tools as well as understanding requirements for new tools. However, it is only recently that researchers have...
Summary of results. After implementing the plain ID3 algorithm, I experimented with
various modifi cations. Two improvements of the process of finding a legal phoneme/stress
could be made by using statistical information about the letter to phoneme/stress-mapping
in the training set.
Adding the CHI-SQUARE test to the ID3 algorithm was...
Information Foraging Theory (IFT) has successfully explained how people seek information in various domains, in turn, informing the design of several tools and information-intensive environments. However, prior research has not explored foraging in the presence of several, very similar variants of the same artifact. Such variants are commonplace in several...
The study of variational typing originated from the problem of type inference for variational programs, which encode numerous different but related plain programs. In this dissertation, I present a sound and complete type inference algorithm for inferring types of all plain programs encoded in variational programs. The proposed algorithm runs...
Over the last two decades, satisfiability and satisfiability-modulo theory (SAT/SMT) solvers have grown powerful enough to be general purpose reasoning engines throughout software engineering and computer science. However, most practical use cases of SAT/SMT solvers require not just solving a single SAT/SMT problem, but solving sets of related SAT/SMT problems....
With the development of technologies in genome sequencing and variant detection, a huge number of variants are detected. To further analyze the variants, it requires an efficient tool to annotate the functional effect of variants. This project managed to develop an efficient program to annotate the functional effect of variants...
Desktop widget engines have emerged as an alternative for completing simple tasks without the need for a full-blown application or constant user interaction. Widgets can simply display data in a compact and visually appealing manner (such as stock tickers, weather forecasts, and news notifications), or go so far as to...
We developed a scientific information management system to facilitate remote access and analysis of earth and space science data, using the Component Model of software development provided by the Java language. The data sets are part of the Earth Observing. System project, being carried out at the College of Oceanic...
Guidelines for using style to improve computer program comprehension
have often been proposed without empirical testing. This thesis reports on the
results of three controlled experiments that investigated ways program style may be
used to aid comprehension of source code listings.
Experiments 1 and 2 were conducted using advanced computer...
Air traffic flow management over the U.S. airpsace is a difficult problem. Current management approaches lead to hundreds of thousands of hours of delay, costing billions of dollars annually. Weather and airport conditions may instigate this delay, but routing decisions balancing delay with congestion contribute significantly to the propagation of...
In the field of Human-Computer Interaction, provenance refers to the complete history and genealogy of a document. Provenance can be useful in identifying related resources, such as different versions of the same document or resources used in the creation of a new document. Though methods of provenance collection and applications...
Appropriate representations of variational software simplify the analysis of their properties.This thesis proposes tailored representations of two kinds variational softwares: difference files of merge commits in Git and feature models. For the former, we use the Choice Edit Model, which is based on the choice calculus, to represent changes introduced...
Apple launched their first “tap-and-pay” mobile payment solution called “ApplePay” in October 2014 in the United States. Quickly catching up with the popularity of Apple Pay, Google launched their own mobile “tap-and-pay” paymentsolution called “Android Pay”. Both the companies claim that their tap-and-paysolutions are more convenient and more secure than...
Streaming media and interactive television viewing experiences are becoming more commonplace with the introduction of services such as Netflix Streaming, the Apple TV, and Google TV aided by the increase adoption of broadband internet. As these services make their way into the living room, and developers struggle to accommodate more...
Expensive pricing for laboratory bioreactors proves to be the main barrier-to-entry for lab-scale academic research. Most laboratory bioreactors are priced at $100K+ and require significant training to use, thereby limiting their accessibility. The IDEAL (Intuitive, Developmental And Research Oriented, Easy To Use, Affordable, And Low Volume) Bioreactor aims to solve...
Forms are an easy-to-use interface to access a database, including a remote database on the Internet. An entity-relationship (ER} diagram, which is a pictorial representation of a database schema, is widely used in designing a database. A class diagram, which shows set of classes, relationships among them, and associations, are...
Scientists and engineers have to analyze and query multiple large databases. Analysis over databases created by phasor measurement units can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data efficiently, which...
Software maintenance accounts for a large portion of the software development cost, particularly the process of updating programs either to adapt for requirement change or to enhance design or efficiency. Currently, program updates are generally performed manually by programmers using text editors. This is an unreliable
method because syntax and...
An extensive theory of symmetric error control coding has been developed in the last few decades. The recently developed VLSI circuits, ROM, and RAM memories have given an impetus to the extension of error control coding to include asymmetric and unidirectional types of error control. The maximal numbers of unidirectional...
Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models, i.e., Software-2.0, has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do...
A bad software development process leads to wasted effort and inferior products. In order to improve a software process, it must be first understood. In this work I focus on understanding software processes.
The first process we seek to understand is Continuous Integration (CI). CI systems automate the compilation, building,...
End users' programs are fraught with errors, costing companies millions of dollars. One reason may be that researchers and tool designers have not yet focused on end-user debugging strategies. To investigate this possibility, this dissertation presents eight empirical studies and a new strategy-based end-user debugging tool for Excel, called StratCel....
Error-correcting output coding (ECOC) is a method for converting a k-classsupervised learning problem into a large number L of two-class supervised learningproblems and then combining the results of these L evaluations. Previous researchhas shown that ECOC can dramatically improve the classi cation accuracy of supervisedlearning algorithms that learn to classify...
The ability to extract uncertainties from predictions is crucial for the adoption of deep learning systems to safety-critical applications. Uncertainty estimates can be used as a failure signal, which is necessary for automating complex tasks where safety is a concern. Furthermore, current deep learning systems do not provide uncertainty estimates,...
We consider the problem of tactical assault planning in real-time strategy games where a team of friendly agents must launch an assault on an enemy. This problem offers many challenges including a highly dynamic and uncertain environment, multiple agents, durative actions, numeric attributes, and different optimization objectives. While the dynamics...
Ensuring correctness of real-world software applications is a challenging task. Testing can be used to find many bugs, but is typically not sufficient for proving correctness or even eliminating entire classes of bugs. However, formal proof and verification techniques tend to be very heavy weight and are simply not available...
Emergence of highly accurate Convolutional Neural Networks (CNNs) with the capability to process large datasets, has led to their popularity in many applications, including safety/security-sensitive (e.g. disease recognition, self-driving cars). Despite the high accuracy of convolutional neural networks, they have been found to be susceptible to adversarial noise added to...
Until now, most hypertext systems have been implemented on large scale computers. With improvements in microprocessors and development of graphical user interfaces, personal computers can run systems that previously needed the power of a mainframe. The low costs and widespread use of PCs will enable many people to use hypertext...
People like going on trips with friends and tend to plan their trips well in advance to have the best possible experience of a destination and get the most out of the places they visit and/or the activities they plan to partake in. Right now, the Internet provides a wealth...
In this report., we address the issues of translating MATLAB scripts into SPMD-style C programs. The resulting programs, when linked with our run-time library are suitable for execution on parallel computers. We describe the design of the compiler and improvements made to it in the current version. We also describe...
As the link between human microbiomes and health has become more established, the interest in applying statistical approaches to microbiome data to understand the mechanisms behind these links has grown. However, microbiome data is often of unmanageable size, and consequently, producing quality lower dimensional representations of samples is a significant...
Automatic Music Transcription is a growing area of interest in Music Information Retrieval, and recent research has shown promise using onset detection on spectrograms. We propose a deep learning model that takes raw audio as input in order to transcribe a solo piano performance from audio to MIDI without complicated...
The project herein described, presents the results of an investigation
in the relatively new and expanding field of computer stimulated
learning machines for use in pattern recognition.
A learning machine, one that benefits from its past experience
was devised in computer program form. It may be described as a
Piecewise-Linear,...
Tracr is a modern browser-based user interface, designed to be used with languages that can generate customized explanations from execution traces. While Tracr is primarily designed for use with the Xtra language, Tracr defines a generalized interface that would allow it to be used with other languages as well. Explanations...
The Pacific Islander diaspora in the Pacific Northwest United States is a unique community characterized by diverse cultures, experiences, and motivations for migration. This study aimed to explore the narratives of individuals from different Pacific Islander backgrounds residing in Oregon, shedding light on their reasons for migration and the impact...
With the rapid advancement of educational technology and the need for personalized, engaging content to accommodate diverse learning needs, Virtual Reality (VR) holds promises for the present and future. However, VR applications suffer from challenges, including usability concerns, lack of pedagogical value, and evaluation standards. This thesis focuses on two...
Narratives are central to communication and the human experience. For a computer system to understand a narrative, it must be able to identify the key facts or plot elements that describe what happened or how the world has changed. These element are called events;establishing a document’s events and the relationships...
The rapid population growth in large urban cities has led to an unprecedented increase in both the number and the diversity of wireless devices and applications with varying quality of service requirements in terms of latency and data rates. LinkNYC is an example of an urban communication network infrastructure, which...
Software maintenance tasks often require finding information within existing code, which is time-consuming and difficult even for professional programmers. For example, programmers may need to know what code implements certain functionality or what is the purpose of certain code. In response, researchers have developed tools to help programmers find information...
Simultaneous speech translation (SimulST) is widely useful in many cross-lingual communication scenarios, including multinational conferences and international traveling. Since text-based simultaneous machine translation (SimulMT) has achieved great success in recent years. The conventional cascaded approach for SimulST uses a pipeline of streaming ASR followed by simultaneous MT but suffers from...
Developers frequently change the type of a program element and update all its references for performance, security, concurrency, library migration, or better maintainability. Despite type changes being a common program transformation, it is the least automated and the least studied. Manually performing type changes is tedious since the programmers have...
There is a software gap in parallel processing. The short lifespan and small installation base of parallel architectures have made it economically infeasible to develop platform-specific parallel programming environments that deliver performance and programmability. One obvious solution is to build architecture-independent programming environments. But the architecture independence usually comes at...
This algorithm presents the first steps towards a solution for novice database administrators that helps them transform a non-normalized relational database into a database in the third normal form. The algorithm uses relational algebra operations that apply principles from the third normal form. This provides the database administrator with an...
The results of a machine learning from user behavior can be thought of as a program, and like all programs, it may need to be debugged. Providing ways for the user to debug it matters because without the ability to fix errors, users may find that the learned program’s errors...
In this thesis, we introduce a novel Explanation Neural Network (XNN) to explain the predictions made by a deep network. The XNN works by embedding a high-dimensional activation vector of a deep network layer non-linearly into a low-dimensional explanation space while retaining faithfulness i.e., the original deep learning predictions can...
In this work, I examine the problem of understanding American football in video. In particular, I present several mid-level computer vision algorithms that each accomplish a different sub-task within a larger system for annotating, interpreting, and analyzing collections of American football video. The analysis of football video is useful in...
The paper describes research on the representation of knowledge. The goal is to develop a formalism which can be used for the testing of hypotheses on the nature of human understanding and as a foundation or artificial intelligence programs. The ideas expressed herein are implemented in a program which converses...
Due to the rapid growth of wireless technology, there has been a growing interest in the capabilities of ad hoc networks connecting mobile phones, PDAs and laptop computers. The distributed and self-configurable capabilities of ad hoc networks make them very attractive for some applications such as tactical communication for military,...
Analysis, visualization, and design of vector fields on surfaces have a wide variety of major applications in both scientific visualization and computer graphics. On the one hand, analysis and visualization of vector fields provide critical insights to the flow data produced from simulation or experiments of various engineering processes. On...
Timing attacks enable an attacker to extract secret information from a cryptosystem by observing timing differences with respect to different inputs given to an encryption or decryption algorithm. Werner Schindler has proposed a timing attack on smart card devices. We implemented this attack based on the same approach for RSA...
We present a proof that the number of breakpoints in the arrival function between two terminals in graphs of treewidth ω is n^(O(log²ω) when the edge arrival functions are piecewise linear. This is an improvement on the bound of n^(Θ(log n))by Foschini, Hershberger, and Suri for graphs without any bound...
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures based on base-pairing probabilities. Two typical examples of the latter group are the popular maximum expected accuracy (MEA) method...
High Performance Computing can find ubiquitous applications in the industry. HPC-applications are specifically designed to take advantage of the parallel nature of the computing systems which is often enabled by Multi-core/Many-core architectures. With this advent of Multi-processors in the mainstream systems, inter-core communication has been one of the major challenges...
Coarse resolution imagery, such as that produced by the MODIS instrument, poses the challenge of estimating sub-pixel proportions of di erent land cover types. This problem is di cult because of the variety and variability of vegetation within individual pixels. This thesis describes and compares two existing algorithms for estimating...
Modern cryptanalysis is generally based on the mathematical theory. However, side-channel analysis has become increasingly popular recently. The benefit of side-channel cryptanalysis is due to the fact that performers can mount attacks with low costs in terms of time and equipment and are highly successful in extracting useful results. The...
In this thesis, I present the variational database management system, a formal framework and its implementation for representing variation in relational databases and managing variational information needs. A variational database is intended to support any kind of variation in a database. Specific kinds of variation in databases have already been...
There are lots of mailing systems available for Apple Macintosh computers. But when this research was started, there were no Voice Mail System for the Macintosh. However, a similar system was available for the NeXT machine. So the main goal of this research was to develop a Voice Mail System...