In this work, we study the problem of learning and improving policies for probabilistic planning problems. In the first part, we train neural network policies for probabilistic planning problems modeled as factored Markov decision problems. The objective is to train problem-specific neural networks via supervised learning to imitate the action...
Correctness and efficiency are important properties of programs. However, to support maintenance and debugging, the programs should also be understandable. Program explanations also play a vital role in educational settings, enhancing the understanding of programs among students.
Proof trees provide a sound basis for generating dynamic explanations of programs. But...
Causal inference is an important analytical tool to bridge the gap between prediction and decision-making. However, learning a causal network solely from data is a challenging task. In this work, various techniques have been explored for a better and improved causal network learning from data. Firstly, the problem of learning...
Multi-relation aggregation queries process the join operator before computing the aggregation function. This join is arguably the most costly operation since traditional join algorithms spend majority of their time trying to join the parts of the relations that do not generate any output tuples. This causes slow response times with...
RNAs play important roles in the central dogma of molecular biology, and are involved in multiple biology processes such as chromatin modification, transcriptional interference and translation initiation. The functions of RNAs, especially non-coding RNAs, are highly related to its secondary structures, therefore computational methods for RNA structure prediction are of...
Ph.D. candidate Qi Wei's thesis consists of two projects: Chemotherapy Project: a study based on the research paper "Predicting chemotherapy response of various cancer types using a variational auto-encoder approach" submitted to the bioRxiv preprint archive and accepted by the BMC bioinformatics; and Wound Monitor Project: implementing and assessing analytics...
Distance estimation is a key process for movement and spatial cognition. However, this process is hindered when navigating virtual environments in virtual reality (VR) due to the movement being exclusively visually simulated. In order to contribute to the understanding of how this hinderance effects our ability to estimate distances in...
Many large-scale data analysis applications involve data that can vary over both time and space. Often the primary goal of analyzing spatiotemporal data is identifying trends, movements, and sudden changes with respect to time, location, or both. This can include a variety of applications in economics (housing prices, unemployment, job...
Developers frequently change the type of a program element and update all its references for performance, security, concurrency, library migration, or better maintainability. Despite type changes being a common program transformation, it is the least automated and the least studied. Manually performing type changes is tedious since the programmers have...
Movement intent decoders, which interpret volitional movement intent from human bioelectric signals, can be incorporated into modern neuroprostheses to offer people living with limb loss or paralysis the potential to regain their lost motor control. Machine learning methods have become the research standard for continuous decoders with high degrees of...
The ability to extract uncertainties from predictions is crucial for the adoption of deep learning systems to safety-critical applications. Uncertainty estimates can be used as a failure signal, which is necessary for automating complex tasks where safety is a concern. Furthermore, current deep learning systems do not provide uncertainty estimates,...
Ecological domains seeking to understand the environment and the behavior of species have received little attention in machine learning (ML), despite the fact that environmental changes have a significant impact on humans as well as ecosystems. Some ecological problems can be formulated similarly to other common ML applications, but there...
This report provides an overview of selected findings from the 2019 NSSE survey completed at OSU. It focuses on two particular subsets of the survey results: Engagement Indicators and High-Impact Practices. The report outlines the insights drawn from the survey to look at the undergraduate student experience at OSU.
As the link between human microbiomes and health has become more established, the interest in applying statistical approaches to microbiome data to understand the mechanisms behind these links has grown. However, microbiome data is often of unmanageable size, and consequently, producing quality lower dimensional representations of samples is a significant...
Attention maps are popular tools of explaining the decisions of convolutional neural net-works (CNNs) for image classification. Typically, for each image of interest, a single attention map is produced, which assigns weights to pixels based on their importance to the classification. We argue that a single attention map provides an...
In recent years, Oblivious Random Access Memory (ORAM) controllers in Trusted Execution Environments (TEEs) have become a popular area of investigation, as coresident trusted systems allow for significantly more efficient oblivious execution. Further, in the case of Intel architectures, oblivious execution effectively eliminates the majority of confidentiality leakage holes in...
Smart home devices, such as voice assistants, smart lights, and smart video doorbells have become a part of end users' daily lives. Many of these devices combine their features with other services and smart devices to create a simple and efficient user experience. This is partly because of the contribution...
Over the last two decades, satisfiability and satisfiability-modulo theory (SAT/SMT) solvers have grown powerful enough to be general purpose reasoning engines throughout software engineering and computer science. However, most practical use cases of SAT/SMT solvers require not just solving a single SAT/SMT problem, but solving sets of related SAT/SMT problems....
In this thesis, I present the variational database management system, a formal framework and its implementation for representing variation in relational databases and managing variational information needs. A variational database is intended to support any kind of variation in a database. Specific kinds of variation in databases have already been...
With the advent of Artificial Intelligence (AI) in every sphere of life in today's day and age, it has become increasingly important for non-AI experts to be able to comprehend the underlying logic of how AI systems work, assess them and find faults in these systems, particularly when they are...
AMD SEV allows for the creation of fully encrypted virtual machines. This allows cloud computing tenants’ data to be secret to the cloud computing provider. However, it has been shown that the encryption scheme used by AMD can easily be broken. The attacker can create a copy of the virtual...
Teaching CS is a challenging task, especially for those without prior programming experience. Even for veteran educators, having to learn CS concepts while also trying to teach those same concepts is difficult; as is the case for many K-12 CS instructors. Unfortunately, there is often no formalized order with CS...
Advances in deep learning based image processing have led to their adoption for a wide range of applications, and in tow with these developments is a dramatic increase in the availability of high quality datasets. With this comes the need to accelerate and scale deep learning applications in order to...
We propose an approach for understanding control policies represented as recurrent neural networks. Recent work has approached this problem by transforming such recurrent policy networks into finite-state machines (FSM) and then analyzing the equivalent minimized FSM. While this led to interesting insights, the minimization process can obscure a deeper understanding...
The process of cleaning data is quite laborious and can be complicated by the presence of inconsistencies and other issues in the data, leading many data scientists to utilize interactive data cleaning software. However, many current state-of-the-art cleaning solutions fail to account for the reality that the human needs to...
Data centers have seen a rapid growth in recent years as the technology industry is relying more on cloud services. In effect, data centers are paying millions of US $ in energy costs. Several strategies have been proposed to lower the total energy cost at data centers- task migration, server...
Compositional data is a type of data where the features are non-negative and always sum to a constant. This type of data is frequently encountered in many fields such as microbiology, geology, economics and natural language processing. Compositional data has unique statistical properties that makes it difficult to apply standard...
We take for granted how quickly we, as humans, form mental models of the world around us. By the time we are toddlers, we have an observable intuition around the physical rules of the world. Stacking blocks such that they don’t fall over becomes such a trivial task, that it...
With the overwhelming number of open-source resources available online, the developer community is always looking to expand their knowledge. Collaborative development has been recognized as an efficient method of learning by several students and industry professionals. Hence, developers seek to obtain more hands-on experience while collaborating with people of similar...
This thesis seeks to explain a new method of extracting temporal logic formulas from time series data using machine learning. Two strategies were followed during the development of this research: first, a generative adversarial network was combined with already-existing temporal logic extraction code by Belta. This was achieved by injecting...
Graphics of hair have been constantly improved since Kajiya’s famous teddy bear in 1989. By combining these concepts and real-world physics laws, I created a realistic representation of hair on a sphere-shaped head. OpenGL also provides a constantly updated display to continuously render the graphic. Using springs, the hair can...
We present a theoretical free-space optical (FSO) transmitter that utilizes a dynamically steered and shaped laser beam to communicate with a randomly moving receiver adorned with a retroreflector. The transmitter tracks the receiver's position by repeatedly scanning the field of view (FOV), measuring reflections from the retroreflector, and estimating the...
The local presence or absence of individual botanical species can be predicted with high accuracy by a simple feed-forward neural network, using only local climate data to make inference. This study proposes a framework for learning these predictive models, demonstrates highly accurate predictions for species with a sufficiently large area...
As fifth generation telecommunications equipment becomes more viable and reliable, demand for high-speed, low-latency, viewpoint specific data analysis is expected to dramatically increase. Systems such as self-driving cars, traffic cameras, warehouses and other commercial buildings will be using fifth generation telecommunications to form ‘smart cities’, driving demand for the edge....
The ever-increasing global population presents looming problems for the field of agriculture. Global food demand will, at some point, increase to the point where there is not enough crop-ready land to keep up. This creates an additional incentive, other than economics, for growers to increase their yield-per-acre and make sure...
The Machine Learning (ML) algorithms are increasingly explored in varies of fields including designing and optimizing computer systems. Recent research, such as optimizing memory/cache prefetching by ML training or predicting traffic pattern in throughput processors, also exhibits a promising future of introducing ML into computer system design and optimization. Throughput...
N-ary relationships, which relate N entities where N is not necessarily two, are omnipresent in real life. In this thesis, we develop a visualization technique for N-ary relationships.
First, we propose a visual metaphor that utilizes vertices and polygons to represent entities and N-ary relationships. Based on this visual metaphor,...
Multi-party computation (MPC) is a field of study focused on devising cryptographic protocols that allow participants to learn the output of some function of their private inputs without trusting a third party to perform the computation. This is usually done at a large scale between data centers, with little emphasis...
We consider three problems on simplicial complexes: the Optimal Bounded Chain Problem, the Optimal Homologous Chain Problem, and 2-Dim-Bounded-Surface. The Optimal Bounded Chain Problem asks to find the minimum weight d-chain in a simplicial complex K bounded by a given (d−1)-chain, if such a d-chain exists. The Optimal Homologous Chain...
Expensive pricing for laboratory bioreactors proves to be the main barrier-to-entry for lab-scale academic research. Most laboratory bioreactors are priced at $100K+ and require significant training to use, thereby limiting their accessibility. The IDEAL (Intuitive, Developmental And Research Oriented, Easy To Use, Affordable, And Low Volume) Bioreactor aims to solve...
In weak supervision learning, label information can be provided at different levels of granularity. For example, in multi-instance multi-label learning, samples are organized into bags and labels for each class are provided at the bag level. For small datasets, this approach offers means of reducing the labeling efforts. However, in...
This project focuses on implementing a model predictive control (MPC) algorithm as a pathfinding algorithm for navigating tracks with the Oregon State University Global Formula Racing (GFR) team's driverless car. The new algorithm would use an optimization function to calculate the least-cost path for the car to take by assigning...
BoGL is a programming language created for the purpose of computer science education that is specific to the domain of board games. Although there is a language grammar and an existing implementation that is currently used by students, a complete and formal language standard does not yet exist. In the...
Metal-organic frameworks are promising novel materials for gas storage and separations because of their extremely high internal surface area and their modular structures based on metal nodes and organic linker molecules. Due to their modular design, it is possible to finely tune a structure for optimal storage of a specific...
Although computer science (CS) education researchers and practitioners have found ways to improve CS classroom inclusivity, few researchers have considered inclusivity of online CS education. We have begun developing a new approach that we term “embedded inclusive design” to address inclusive CS. The essence of the approach is to integrate...
This dissertation addresses few-shot object segmentation in images. The goal of segmentation is to label every image pixel with a class of the object occupying that pixel, where the class may represent a semantic object category or instance. In few-shot segmentation, training and test datasets have different classes. Every new...
With robots becoming increasingly commonplace in today’s society, we need to find ways to help robots to become better accepted. One common method used to gain acceptance and build rapport is through the use of humor. Currently, research in robot comedy exists, albeit it is fairly limited. Our goal with...
Recently there has been a large amount of research into basic human movement as a baseline for robotic motion. Robotic grasping is a challenging problem for a number of reasons. One of which is that it is difficult to accurately know the interaction between the hand and the object, especially...
The abilities of plant biologists to characterize the genetic basis of physiological traits are limited by their abilities to obtain quantitative data representing precise details of trait variation and mainly to collect this data on a high-throughput scale at low cost. Deep learning-based methods have demonstrated unprecedented potential to automate...