Although deep reinforcement learning agents have produced impressive results in many domains, their decision making is difficult to explain to humans. To address this problem, past work has mainly focused on explaining why an action was chosen in a given state. A different type of explanation that is useful is...
Newcomers’ seamless onboarding is important for open collaboration communi- ties, particularly those that leverage outsiders’ contributions to remain sustainable. Nevertheless, previous work shows that OSS newcomers often face several barriers to contribute, which lead them to lose motivation and even give up on contributing. A well-known way to help newcomers...
As a general solution to the problem of managing structural and content variability in relational databases, in previous work we have introduced the Variational Database Management System (VDBMS). VDBMS consists of a representation of a variational database (VDB) and a corresponding typed query language (v-query). However, since this is a...
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures based on base-pairing probabilities. Two typical examples of the latter group are the popular maximum expected accuracy (MEA) method...
More and more people have incorporated GIF in their messaging these days and often send gif as a reply. GIF is Graphics Interchange Format and is a short-animated picture without a sound. Searching a trivial gif with a regular emotion is easy to find but if some iconic expression is...
The rapid population growth in large urban cities has led to an unprecedented increase in both the number and the diversity of wireless devices and applications with varying quality of service requirements in terms of latency and data rates. LinkNYC is an example of an urban communication network infrastructure, which...
How should reinforcement learning (RL) agents explain themselves to humans not trained in AI? To gain insights into this question, we conducted a 124 participant, four-treatment experiment to compare participants’ mental models of an RL agent in the context of a simple Real-Time Strategy (RTS) game. The four treatments isolated...
People like going on trips with friends and tend to plan their trips well in advance to have the best possible experience of a destination and get the most out of the places they visit and/or the activities they plan to partake in. Right now, the Internet provides a wealth...
New MS in CS students in the Electrical Engineering and Computer Science school at OSU are required to file their Program of Study by the end of their 2nd term. Many of them, especially international students, are in a totally new ecosystem, so they find it overwhelming to choose the...
In today’s world, we are highly dependent on software systems together with devices for almost every task in our day to day life. Software system upgrades are released whenever it is necessary to accommodate the ever-changing user’s needs. The devices we use to run the software systems might be of...
Most database users do not know formal query languages, such as SQL, and prefer to express their information needs using usable query languages, such as keyword queries. Keyword queries, however, are inherently ambiguous and challenging for the database systems to understand and answer effectively. We propose a novel approach to...
One goal of using robots in introductory computer science (CS) courses is to improve motivation among learners. In this study, we investigate the effectiveness of using the Cozmo and Lego Mindstorm robots to improve students’ motivation in a CS orientation course, and we describe our experience using these robots in...
Automatic music transcription (AMT) is the task, given an acoustic representation of music, to recover a symbolic notation of the written notes expressed by the sound. Transcribing music with multiple notes sounding simultaneously is difficult for both humans and machines. Much existing work on AMT has focused on suitable acoustic...
The Jetson Artificial Intelligence Tool chain (JAI-TC) is a set of packages, APIs and libraries for Artificial Intelligence applications to be deployed on the NVidia SOC, Jetson TX2. JAI-TC automates the installation of these items allowing for a wider set of users to leverage these technologies. Prior to this, the...
Spreadsheets are a pervasive technology throughout personal and industrial use. Often times, the user is not the author, contributing to a lack of understanding of the purpose and functionality of a spreadsheet. Furthermore, the lack of understanding is a major reason for mistakes in the use and maintenance of spreadsheets....
The increased demand for building materials that are friendly to the environment, along with the latest advances in wood science and technology, which exploit the fiber orientation of wood, resulted in composite wood materials known as mass-timber products. To understand the effects the wood fiber orientation has on the dynamic...
This report presents an efficient method for semi-supervised video object segmentation – the problem of identifying foreground pixels occupied by a target object. The target is specified by the ground-truth mask in the first video frame. While the state of the art achieves a segmentation accuracy greater than 80%, it...
Although the need for gender-inclusivity in software itself is gaining attention among both SE researchers and SE practitioners, and methods have been published to help, little has been reported on how to make such methods work in real-world settings. For example, how do busy software practitioners use such methods in...
In this research, we address the problem of learning a single causal network structure from multiple dataset generated from different experiments. The experiments can be observational or interventional. We assume that each dataset is generated by an unknown causal network altered under different experimental conditions (interventions, manipulation or perturbation). As...
Deep learning and neural network has been widely used in research, deep learning has empowered many tasks such as point clouds segmentation and shape recognition. One of the main advantages of deep interaction point cloud segmentation is that it allows the feature extraction can be learned through neural network based...
As the number of wireless devices, the demand for high data rates, and the need for always-on connectivity are growing and becoming more stringent with the evolvement and emergence of 5G systems, network engineers and researchers are being faced with new unique challenges that need to be addressed. Among many...
Question answering forums like Reddit have been quite effective in improving social interaction and disseminating useful information. Community members ask a variety of questions related to a subject which are answered by other community members. The answers are given ratings by other members. In this thesis we study the problem...
Heatmap regression has became one of the mainstream approaches to localize facial landmarks. As Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are becoming popular in solving computer vision tasks, extensive research has been done on these architectures. However, the loss function for heatmap regression is rarely studied. In...
Deep neural networks currently comprise the backbone of many applications where safety is a critical concern, for example: autonomous driving and medical diagnostics. Unfortunately these systems currently fail to detect out-of-distribution (OOD) inputs and can be prone to making dangerous errors when exposed to them. In addition, these same systems...
In open set recognition, a classifier must label instances of known classes while detecting instances of unknown classes not encountered during training. To detect unknown classes while still generalizing to new instances of existing classes, this thesis introduces a dataset augmentation technique called counterfactual image generation. This approach, based on...
Computer science is, at its core, about solving problems. The "Carry out the Plan" portion of problem solving is often examined and emphasized in CS 1 and CS 2, forgetting to emphasize the other important aspects of the problem solving process. This study focuses on the other problem-solving steps, which...
Variation is a commonly encountered element of modern software systems. Recent research into variation has led to increasing interest in the development of variational programming languages and corresponding variational models of execution. Variational imperative languages pose a particular challenge due to the complex interactions between side effects and variation. The...
Smart Manufacturing (SM) is envisioned to make manufacturing processes more efficient through automation and integration of networked information systems. Robotic arms are integral to this vision. However the benefits of SM, enabled by automation and networking, also come with cyber risks.
In this work, we propose an anomaly detection framework...
This thesis consists of two major components. The first part is concerned with video object instance segmentation (VOS), which is the task of assigning per-pixel labels perframe of a video sequence to indicate foreground object instance membership, given the first frame ground truth mask. VOS has myriad applications, from video...
Geographical datasets are large, complex, and can be difficult for users to navigate and derive meaning from. These datasets, as well as the unique insights derived from them, provide tremendous opportunity for social change -- many of the global challenges humankind is currently facing can benefit from analytics or visualization...
Software Defined Network (SDN) is a new networking concept in which the data plane is decoupled from the control plane. Due to the huge demand on network resources and the rapidly increasing numbers of devices needing to be connected to the network, SDN paradigm has emerged as a potential solution...
In histopathological image analysis, image classification as well as pattern detection play a crucial role in the diagnosis and treatment process since the goal is to not only differentiate cancer types but also identify cancerous manifestations. Fully supervised learning strategies tend to address these problems using manually annotated cancerous regions...
The importance of data visualization is becoming increasingly more substantial to the field of optimization and engineering design where a carefully designed visualization of the data on decision parameters (i.e Decision Space) and performance functions (i.e Objective Space) is critical to the success of the decision making process.
One of...
Internet of Things (IoT) is an integral part of application domains such as smart-home, digital healthcare, smart grid systems and vehicular networks. Various standard public key cryptography techniques (e.g., key exchange, public key encryption, digital signature) are available to provide fundamental security services for IoTs. However, despite their pervasiveness and...
Tensor mathematics provides a powerful language to visualize and analyze physical phenomena. In the last three decades, tensors have been used in various application areas. The visualization and analysis of tensors fields have seen much advance, both in 2D and 3D. However, the physical interpretations of the topological analysis are...
The high level of parallelism in throughput processors such as GPGPUs has resulted in significantly changed on-chip data traffic behaviors. This demands new research to identify and address the limiting factors of networks-on-chip (NoCs) in the context of throughput processors. In this work, we first quantitatively analyze the performance of...
The Rust programming language is a systems programming language with a strong static type system. A central feature of Rust’s type system is its unique concept of “ownership”, which enables Rust to give a user safe, low-level control over resources without the overhead of garbage collection. In Haskell, most data...
”Until relatively recently, mankind was not aware that there was a separable binocular depth sense. Through the ages, people like Euclid and Leonardo understood that we see different images of the world with each eye. But it was Wheatstone who in 1838 explained to the world, with his stereoscope and...
Data variations are prevalent in real-world applications. For example, software vendors have to handle numerous variations in the business requirements, conventions, and environmental settings of a software product. In database-backed software, the database of each version may have a different schema and content. As another example, data scientists often need...
Software history and version control systems (VCS) are an important source of information for developers. This entails the need for a principled understanding of developers’ information seeking in VCS, both for improving existing tools as well as understanding requirements for new tools. However, it is only recently that researchers have...
This thesis addresses the problem of temporal action segmentation in videos, where the goal is to label every video frame with the appropriate action class present. We focus on the domain of NFL football videos, where action classes represent common football play types. For action segmentation, we use a temporal...
The aim of this thesis is to study past 10 years of security vulnerabilities reported against Linux Kernel and all existing mitigation techniques that prevent the exploitation of those vulnerabilities. To systematically study the security vulnerabilities, they were categorized into classes and sub-classes based on their type.
This thesis first...
The history of a software project plays a vital role in the software development process. Version control systems enable users of a software repository to look at the evolution of the source code, and see the changes that led to newer versions. Currently, version control systems provide commands that can...
Mixed-initiative programming entails collaboration between a computer system, and a human to achieve some desired goal or set of goals. Often these goals change or are amended in real time during the course of program execution. As such, the plans these programs are based on must adapt and evolve to...
Complex information environments are often organized as hierarchies. However, computational models of Information Foraging Theory (IFT) have almost entirely ignored this fact. Models and tools for predicting programmer navigations have ignored people’s foraging behavior across hierarchies —called hierarchical foraging. Without modeling hierarchical foraging, our ability to build tools to support...
Urban green space is associated with multiple physical and mental health outcomes. Several benefits of green space, such as stress reduction and attention restoration, are dependent on visual perception of green space exposures. However, traditional green space exposure measures do not capture street-level exposures. In this project, we apply deep...
Assessing and understanding intelligent agents can be a difficult task for users who may lack an artificial intelligence (AI) background. A relatively new area, called “explainable AI,” is emerging to help address this problem, but little is known about how to present and structure information that an explanation system might...
Severe weather in the United States causes huge insured losses to crop and property frequently.It creates major impact and elicit diverse response in the weather insurance industry. Events like hail, storm, hurricane etc. are more likely to cause catastrophe losses. So it becomes crucial to collect and analyze these extreme...
The problem of supporting more advanced selective undo operations has received a lot of attention. However, selective undo is generally missing in commonly used editors. Moreover, partial selective undo, the ability of undoing just part of some edit so that other edits may be undone, is not supported at all....
Online survey data collection is becoming popular because it provides benefits in cost, ease of collecting and managing data, flexibility in format, and access to a diverse population. Surveys are often used for health studies such as Oregon State University’s WAVE Project, which utilizes the WavePipe system, a server enabling...
Many applications require not only representing variability in software and data, butalso computing with it. To do so efficiently requires variational data structures thatmake variability explicit in the underlying data and the operations used to manipulate it.Variational data structures have been developed ad hoc for many applications, but thereis little...
Uploading everyday information about food intake, sleep, number of steps and then generating consolidated peer visual reports for participants in large-scale health studies, often divided into multiple treatment groups, can be challenging.
This challenge is even bigger if subjects are young teenagers between the age of 14-19 active in sports,...
Automatic event extraction from natural text is an important and challenging task for natural language understanding. Traditional event detection methods heavily rely on manually engineered rich features. Recent deep learning approaches alleviate this problem by automatic feature engineering. But such efforts, like tradition methods, have so far only focused on...
This thesis considers the problem of training convolutional neural networks for online visual tracking. A major challenge for single object visual tracking is that most training sets with frame-level track annotations are quite small, due to the prohibitive cost of manual annotation. Current training approaches either supplement the annotations with...
Species distribution models (SDM), which quantify the correlation between the distribution of a species and environmental factors, are increasingly used to map and monitor animal and plant distributions in the context of awareness of environmental change and its ecological consequence. For perfect data, this is a straightforward classification problem from...
Two parties, Alice and Bob, hold inputs x and y respectively. They wish to compute a function f of their inputs. In an ideal world, f(x, y) could be calculated by sending the inputs to a trusted third party. In the absence of such a third party, Alice and Bob...
Reasoning about 3D shape of objects is important for successful computer visionapplications in robotics, 3D rendering and modeling. In this thesis, we address twoproblems { First, given an image, we generate 3D shape of the foreground object thatappears in the image. Second, we predict the class label of the input...
Gender issues have recently received increased attention in human robot interaction (HRI). Because robots are becoming part of our homes and daily lives, it is important to understand how different groups of people use them. To the best of our knowledge, almost no research has been done that investigates gender...
Apple launched their first “tap-and-pay” mobile payment solution called “ApplePay” in October 2014 in the United States. Quickly catching up with the popularity of Apple Pay, Google launched their own mobile “tap-and-pay” paymentsolution called “Android Pay”. Both the companies claim that their tap-and-paysolutions are more convenient and more secure than...
Cold air pools are spatiotemporal phenomena that occur when cold air from higher elevations roll down the slope to accumulate in lower elevations. Behaviors like this lead to microclimate anomalies such as the city of Corvallis (Oregon) experiencing persistent cold weather even on a sunny day. We analyze multivariate temperature...
Previous work introduced the GenderMag method, a software inspection method used to help software creators identify features within their software that are not gender-inclusive. Inclusiveness of software (gender or otherwise) matters because supporting diversity matters—it is well-known that the more diverse a group of problem-solvers, the higher the quality of...
This thesis is about visual relationship detection. This is an important task in computer vision. The goal is to detect all visual relationships in a given image between objects. This thesis presents a new approach to this problem. Our approach does not use an object detector as a common pre-processing...
Appropriate representations of variational software simplify the analysis of their properties.This thesis proposes tailored representations of two kinds variational softwares: difference files of merge commits in Git and feature models. For the former, we use the Choice Edit Model, which is based on the choice calculus, to represent changes introduced...
This thesis proposes a novel technique that exploits spectrum occupancy behaviors inherent to wideband spectrum access to enable efficient cooperative spectrum sensing. The proposed technique reduces the number of required sensing measurements while accurately recovering spectrum occupancy information. It does so by leveraging compressive sampling theory to exploit the block-like...
The constant increase in marine traffic requires a strategy to manage safety. The automatic identification system (AIS) was developed as a navigation safety device for ships in the 1990s. AIS is intended, primarily, to allow ships to view marine traffic in their area and to be seen by that traffic....
Semantic image segmentation is a relatively difficult task in computer vision. With the advent of deep learning, semantic image segmentation is increasingly of interest for researchers because of the excellent predictions from Convolutional Neural Network (CNN). However, CNNs have proven to struggle with obtaining global context of image due to...
3D symmetric tensor fields have a wide range of applications, such as in solid and fluid mechanics, medical imaging, meteorology, molecular dynamics, geophysics and computer graphics. There has been much research carried out in this field, yet our knowledge of the tensor field is still at its initial stage to...
Software testing is the process of evaluating the accuracy and performance of software, and automated software testing allows programmers to develop software more efficiently by decreasing testing costs. We compared two advanced random test generators, a Feedback-Directed Random Test Generator (FDR) and a Feedback-Controlled Random Test Generator (FCR), for an...
Mutation testing is one of the effective approaches measuring test adequacy of test suites. It is widely used in both academia and industry. Unfortunately, the adoption and practical use of mutation testing for Python 2.x programs face three obstacles. First, limited useful mutation operators. Existing mutation testing tools support very...
CoprHD is an open source software-defined storage and API platform which creates an abstraction layer over multi-vendor heterogeneous storage systems. It offers the ability to discover, pool and automate the management of the storage ecosystem with the help of storage drivers establishing connections between CoprHD and storage systems. On the...
With the development of technologies in genome sequencing and variant detection, a huge number of variants are detected. To further analyze the variants, it requires an efficient tool to annotate the functional effect of variants. This project managed to develop an efficient program to annotate the functional effect of variants...
Debugging, an integral part of software development, is difficult for end-user programmers, especially in the case of complex programs. The process of isolating errors is time consuming without the help of debugging support provided by the tool. For example, the visual programming tool LondonTube supports creation of custom mobile-cloud-web applications,...
The ability to create reproducible cryptographically secure keys from temporal environments (e.g., images) has the potential to be a contributor to effective cryptographic mechanisms. Due to the noisy nature of these environments, achieving this goal in a user friendly fashion is a very challenging task, especially since there exists a...
Software Defined Storage is a term for data storage software to manage policy-based provisioning and management of heterogeneous data storage system abstracting underlying hardware. CoprHD is a software defined storage controller and API platform which enables policy-based management and cloud automation of storage resources for block, object and file storage...
General-purpose Graphics Processing Units (GPGPUs) have become a critical component in high-performance computing (HPC) systems in executing modern computational workloads. The high thread level parallelism (TLP) and programmable shader cores allow thousands of threads to execute in Parallel. The fast-scaling of GPGPUs have increased the demand for performance optimizations on...
In data-centers, running multiple isolated workloads while getting the most performance out of available hardware is key. For many years Virtual Machines (VMs) have been an enabler, but native containers which offer isolation similar to virtual machines while reducing overhead costs associated with emulating hardware resources have become an increasingly...
RNA secondary structure prediction maps a RNA sequence to its secondary structure (set of AU, CG, and GU pairs). It is an important problem in computational biology be-cause such structures reveals crucial information about the RNAs function, which is useful in many applications ranging from noncoding RNA detection to folding...
Electric grid is a critical cyber-physical infrastructure that serves as lifeline for modern society. With the increasing trend of cyber-attacks, electric grid security has become a significant concern. Electric grid operators are working hard to reduce the risk of these attacks towards the system. Having security metrics for monitoring the...
The Intel Xeon Phi is a relative newcomer to the scientific computing scene. In the recent years, GPUs have been used extensively for mathematical simulations. The Xeon Phi is Intel’s response to the use of these cards. Like the GPU, it is highly parallelizable but can be programmed like a...
In the current education environment, many instructors make use of some type of software, such as Visual Studio or a software library like OpenGL, in the classroom. Incorrect setup and configuration on an individual’s own system is a common problem when using these software tools. This thesis explores the difficulty...
CoprHD is an open source software defined storage controller platform. It holds an inventory of all storage devices in the data center and understands their connectivity. It is an operating system for a storage cloud. It is designed with two key goals in mind:
• Make an enterprise or a...
Surveys are often used in health studies to collect data about participants for scientific research. An increasing number of health scientists are turning to online data collection methods because they are less costly and can reach a large diverse population quickly. Online surveys also make it easy to track and...
Given k terminal pairs (s₁,t₁),(s₂,t₂),..., (s[subscript k],t[subscript k]) in an edge-weighted graph G, the k Shortest Vertex-Disjoint Paths problem is to find a collection P₁, P₂,..., P[subscript k] of vertex-disjoint paths with minimum total length, where P[subscript i] is an s[subscript i]-to-t[subscript i] path. As a special case of the...
The grid company enforces high penalties for the peak power demands of cloud data centers. These high penalties result in high electricity bills that can be avoided by relying on the servers' Uninterruptible Power Supply (UPS) as a source of energy during peak load periods. This thesis proposes a management...
We model the popular board game of Clue as an MDP and evaluate Monte-Carlo policy rollout in a simulated environment pitting different agents and policies against each other. We describe the choices we made in the representation, along with some of the problems we encountered along the way. We find...
Hop (Humulus lupulus L. var lupulus) is a plant of great cultural significance, used as a medicinal herb for thousands of years, and for flavor and as a preservative in brewing beer. Studies of the medicinal effects of the unique compounds produced by hop have led to interest from the...
While individual portfolio diversity analysis is a well-studied problem in visualization, the visual analysis of individual or groups of portfolios, over time, has received little attention. Such analysis, however, is important to researchers who are interested in better understanding portfolio management behavior of experts as well as novices. We conducted...
Modeling tire-snow interaction is important in designing effective snow tires, which directly affects road safety during wintry weather. Unfortunately, tires have complex tread designs and the physical properties of snow have not been characterized. We employ the Material Point Method (MPM) for simulating a material that mimics the fracturing and...
In this thesis, we present semantic equivalence rules for an extension of the choice calculus and sound operations for an implementation of variational lists. The choice calculus is a calculus for describing variation and the formula choice calculus is an extension with formulas. We prove semantic equivalence rules for the...
The design of programming tools is slow and costly. To ease this process, we have developed a design pattern catalog aimed at providing guidance about how to design tools for developers. This guidance is grounded in Information Foraging Theory (IFT), which empirical studies have shown to be useful for understanding...
An important impact of the genome technology revolution will be the elucidation of mechanisms of cancer pathogenesis, leading to improvements in the diagnosis of cancer and the selection of cancer treatment. Integrated with current well-studied massive knowledge and findings about the role of protein-coding mutations in cancer, demystifying the functional...
Social media sources such as Twitter represent a massively distributed social sensor over diverse topics ranging from social and political events to entertainment and sports news. However, due to the overwhelming volume of content, it can be difficult to identify novel and significant content within a broad topic in a...
Branched covering spaces are a mathematical concept which originates from complex analysis and topology and has found applications in tensor field topology and geometry re-meshing. Given a manifold surface and an N-way rotational symmetry field, a branched covering space is a manifold surface that has an N-to-1 map to the...
The outsourcing of data storage and related infrastructure to third-party services in the cloud is a trend that has gained considerable momentum in the last decade due to the savings it affords companies in both capital and operational costs. Although encryption can alleviate some of the privacy concerns associated with...
Graphics hardware in mobile devices has become more powerful, allowing rendering techniques such as ray-cast volume rendering to be done at interactive rates. This increase of performance provides desktop capabilities combined with the portability of a tablet. Volumes can demand a high amount of memory in order to be loaded...
Application Stores, such as the iTunes App Store, give developers access to their users’ complaints and requests in the form of application reviews. However, little is known about how developers are responding to application reviews. Without such knowledge developers, users, Application Stores, and researchers could make incorrect assumptions. To address...
Sports analytics is rapidly evolving today through the use of computer vision systems that automatically extract huge amount of information inherently present in multimedia data without much human assistance. This information can facilitate a better understanding of patterns and strategies in various sports. However, for non-professional teams, due to expense...
In previous work, the Idea Garden was created to help those relatively new to programming overcome their barriers in CoScripter. The goal of this thesis was to generalize the Idea Garden's success to other users and environments. We present a set of principles on how to help EUPs like this...
Open Source software gives users the freedom to copy, modify and redistribute source code without legal entanglements. The evolution of these software communities usually depend a lot on how the participating developers and users interact and co-operate with each other. Over the past few years, open source software have become...
Biologists regularly collect images of leaves for their further studies. One such biological study of leaves is scoring the phenomic characters of leaves for the construction of the Tree of Life (ToL), i.e. the evolutionary lineage of taxa in botany. There is an opportunity for computer vision to help biologists...