Newcomers’ seamless onboarding is important for open collaboration communi- ties, particularly those that leverage outsiders’ contributions to remain sustainable. Nevertheless, previous work shows that OSS newcomers often face several barriers to contribute, which lead them to lose motivation and even give up on contributing. A well-known way to help newcomers...
Many database users are not familiar with formal query languages, the concept of schema, or the exact content of their database. Thus, it is challenging for these users to formulate their information needs over semi-structured and structured databases. To address this problem, researchers have proposed usable query interfaces over which...
Our goal is to build a system to model the RNA sequences that reveals their structural information by using efficient dynamic programming algorithms and deep learning approaches. We aim to 1) achieve linear-time for RNA secondary structure prediction based on existing minimum free energy models; 2) utilize deep neural networks...
As a general solution to the problem of managing structural and content variability in relational databases, in previous work we have introduced the Variational Database Management System (VDBMS). VDBMS consists of a representation of a variational database (VDB) and a corresponding typed query language (v-query). However, since this is a...
Machine learning (ML) and deep learning (DL) models impact our daily lives with applications in natural language modeling, image analysis, healthcare, genomics, and bioinformatics. The exponential growth of biological sequence data necessitates accompanying advances in computational methods. Although deep learning is highly effective for detecting and classifying biological sequences, challenges...
More and more people have incorporated GIF in their messaging these days and often send gif as a reply. GIF is Graphics Interchange Format and is a short-animated picture without a sound. Searching a trivial gif with a regular emotion is easy to find but if some iconic expression is...
Learning Analytics and other branches of Educational Research such as Computing Education Research (CER) implicitly assume that students, especially college students, have no barriers to access learning platforms or software packages. This assumption may be attributed to such pervasive beliefs such as "everyone has a device", or "everyone can access...
Building software systems that adapt to the changing environment is challenging. Developers cannot anticipate all the changes in advance, and even if they could, the effort required to handle such situations is too onerous for practical purposes. Self Adaptive Software (SAS) adapts itself as per changing environment. The area of...
Humans are remarkably efficient in learning by interacting with other people and observing their behavior. Children learn by watching their parents’ actions and mimic their behavior. When they are not sure about their parents demonstration, they communicate with them, ask questions, and learn from their feedback. On the other hand,...
In today’s world, we are highly dependent on software systems together with devices for almost every task in our day to day life. Software system upgrades are released whenever it is necessary to accommodate the ever-changing user’s needs. The devices we use to run the software systems might be of...
As one of database types, CSV type database is widely used by many people because the type serves various useful purposes for business, engineering, and environment. However, the CSV type database could be complicated to understand because it is usually written on spreadsheet like Microsoft Excel format. Therefore, data visualization...
Most database users do not know formal query languages, such as SQL, and prefer to express their information needs using usable query languages, such as keyword queries. Keyword queries, however, are inherently ambiguous and challenging for the database systems to understand and answer effectively. We propose a novel approach to...
Scientists and engineers have to analyze and query multiple large databases. Analysis over databases created by phasor measurement units can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data efficiently, which...
This thesis studies the problem of structured prediction (SP), where the agent needs to predict a structured output for a given structured input (e.g., Part-of-Speech tagging sequence for an input sentence). Many important applications including machine translation in natural language processing (NLP) and image interpretation in computer vision can be...
Narratives are central to communication and the human experience. For a computer system to understand a narrative, it must be able to identify the key facts or plot elements that describe what happened or how the world has changed. These element are called events;establishing a document’s events and the relationships...
The advent of deep learning models leads to a substantial improvement in a wide range of NLP tasks, achieving state-of-art performances without any hand-crafted features. However, training deep models requires a massive amount of labeled data. Labeling new data as a new task or domain emerges consumes time and efforts...
Automatic music transcription (AMT) is the task, given an acoustic representation of music, to recover a symbolic notation of the written notes expressed by the sound. Transcribing music with multiple notes sounding simultaneously is difficult for both humans and machines. Much existing work on AMT has focused on suitable acoustic...
The increased demand for building materials that are friendly to the environment, along with the latest advances in wood science and technology, which exploit the fiber orientation of wood, resulted in composite wood materials known as mass-timber products. To understand the effects the wood fiber orientation has on the dynamic...
Spreadsheets are a pervasive technology throughout personal and industrial use. Often times, the user is not the author, contributing to a lack of understanding of the purpose and functionality of a spreadsheet. Furthermore, the lack of understanding is a major reason for mistakes in the use and maintenance of spreadsheets....
Traditional bus-based interconnects are simple and easy to implement, but the scalability is greatly limited. While router-based networks-on-chip (NoCs) offer superior scalability, they also incur significant power and area overhead due to complex router structures. In this thesis, a new class of on-chip networks, referred to as Routerless (RL) NoCs,...
One goal of using robots in introductory computer science (CS) courses is to improve motivation among learners. In this study, we investigate the effectiveness of using the Cozmo and Lego Mindstorm robots to improve students’ motivation in a CS orientation course, and we describe our experience using these robots in...
Anomaly detection has been used in variety of applications in practice, including cyber-security, fraud detection and detecting faults in safety critical systems, etc. Anomaly detectors produce a ranked list of statistical anomalies, which are typically examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, most...
Parking around Oregon State University campus can be severely limited, and for students looking to spend all day studying and going to classes it can be a pain to find parking that lasts for more than two hours. This thesis aims to resolve these issues by developing an application called...
In a big data day and age, there is an abundance of information and we now have the tools to understand it with data mining and machine learning algorithms. With the rise and fall and rise again of Bitcoin, the finance industry and society itself is caught in between. There...
The Jetson Artificial Intelligence Tool chain (JAI-TC) is a set of packages, APIs and libraries for Artificial Intelligence applications to be deployed on the NVidia SOC, Jetson TX2. JAI-TC automates the installation of these items allowing for a wider set of users to leverage these technologies. Prior to this, the...
This thesis describes the implementation of ultrasonic sensors to trigger a stimulus to increase peer interaction for children with disabilities using modified ride-on-cars for mobility. Modified ride-on-car technology has improved mobility for children with disabilities by effectively replicating the social benefits of a powered mobility device, yet there are opportunities...
Learning novel concepts from relational databases is an important problem with applications in several disciplines, such as data management, natural language processing, and bioinformatics. For a learning algorithm to be effective, the input data should be clean and in some desired representation. However, real-world data is usually heterogeneous – the...
Recent studies have shown that novel continuous dropout methods can be viewed as a Bayesian interpretation of model parameters, though most such studies have shown results using normal distributions. As the posterior distributions over neural network nodes and parameters are intractable, given that they are a result of artificial construction...
Although the need for gender-inclusivity in software itself is gaining attention among both SE researchers and SE practitioners, and methods have been published to help, little has been reported on how to make such methods work in real-world settings. For example, how do busy software practitioners use such methods in...
Most software systems today do not support cognitive diversity. Further, because of differences in problem-solving styles that cluster by gender, software that poorly supports cognitive diversity can also embed gender biases. To help software professionals fix gender bias “bugs” related to people’s problem-solving styles for information processing and learning of...
This report presents an efficient method for semi-supervised video object segmentation – the problem of identifying foreground pixels occupied by a target object. The target is specified by the ground-truth mask in the first video frame. While the state of the art achieves a segmentation accuracy greater than 80%, it...
The relationship between the public and new technologies has historically been a tumultuous one, with public perceptions ranging from excited rapid adoption to standoffish pessimism. In 2011, IBM tried to use competition as a means of showcasing a new technology to the public. This thesis is a work of rhetorical...
In this research, we address the problem of learning a single causal network structure from multiple dataset generated from different experiments. The experiments can be observational or interventional. We assume that each dataset is generated by an unknown causal network altered under different experimental conditions (interventions, manipulation or perturbation). As...
As the number of wireless devices, the demand for high data rates, and the need for always-on connectivity are growing and becoming more stringent with the evolvement and emergence of 5G systems, network engineers and researchers are being faced with new unique challenges that need to be addressed. Among many...
Deep learning and neural network has been widely used in research, deep learning has empowered many tasks such as point clouds segmentation and shape recognition. One of the main advantages of deep interaction point cloud segmentation is that it allows the feature extraction can be learned through neural network based...
People like going on trips with friends and tend to plan their trips well in advance to have the best possible experience of a destination and get the most out of the places they visit and/or the activities they plan to partake in. Right now, the Internet provides a wealth...
New MS in CS students in the Electrical Engineering and Computer Science school at OSU are required to file their Program of Study by the end of their 2nd term. Many of them, especially international students, are in a totally new ecosystem, so they find it overwhelming to choose the...
Information Foraging Theory (IFT) has successfully explained how people seek information in various domains, in turn, informing the design of several tools and information-intensive environments. However, prior research has not explored foraging in the presence of several, very similar variants of the same artifact. Such variants are commonplace in several...
RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures based on base-pairing probabilities. Two typical examples of the latter group are the popular maximum expected accuracy (MEA) method...
How should reinforcement learning (RL) agents explain themselves to humans not trained in AI? To gain insights into this question, we conducted a 124 participant, four-treatment experiment to compare participants’ mental models of an RL agent in the context of a simple Real-Time Strategy (RTS) game. The four treatments isolated...
Although deep reinforcement learning agents have produced impressive results in many domains, their decision making is difficult to explain to humans. To address this problem, past work has mainly focused on explaining why an action was chosen in a given state. A different type of explanation that is useful is...
The rapid population growth in large urban cities has led to an unprecedented increase in both the number and the diversity of wireless devices and applications with varying quality of service requirements in terms of latency and data rates. LinkNYC is an example of an urban communication network infrastructure, which...
Question answering forums like Reddit have been quite effective in improving social interaction and disseminating useful information. Community members ask a variety of questions related to a subject which are answered by other community members. The answers are given ratings by other members. In this thesis we study the problem...
To users, mobile touchscreen devices have appealing characteristics; among these characteristics is intuitiveness, which leads to mobile devices being used almost everywhere by almost everyone to accomplish almost anything. This statement, to some degree, holds for children too. Despite touchscreen devices’ intuitiveness and popularity, we don’t know much about how...
This dissertation addresses the problem of video labeling at both the frame and pixel levels using deep learning. For pixel-level video labeling, we have studied two problems: i) Spatiotemporal video segmentation and ii) Boundary detection and boundary flow estimation. For the problem of spatiotemporal video segmentation, we have developed recurrent...
Heatmap regression has became one of the mainstream approaches to localize facial landmarks. As Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are becoming popular in solving computer vision tasks, extensive research has been done on these architectures. However, the loss function for heatmap regression is rarely studied. In...
Computer science is, at its core, about solving problems. The "Carry out the Plan" portion of problem solving is often examined and emphasized in CS 1 and CS 2, forgetting to emphasize the other important aspects of the problem solving process. This study focuses on the other problem-solving steps, which...
Anime and manga’s general lack of brown and black characters may not seem unusual at first considering that the medium is produced in Japan by Japanese creators for Japanese audiences, and thus chiefly features Japanese characters. However, its significant proportion of white characters necessitates a more critical investigation of racial...
Deep neural networks currently comprise the backbone of many applications where safety is a critical concern, for example: autonomous driving and medical diagnostics. Unfortunately these systems currently fail to detect out-of-distribution (OOD) inputs and can be prone to making dangerous errors when exposed to them. In addition, these same systems...
In open set recognition, a classifier must label instances of known classes while detecting instances of unknown classes not encountered during training. To detect unknown classes while still generalizing to new instances of existing classes, this thesis introduces a dataset augmentation technique called counterfactual image generation. This approach, based on...