Our goal is to build a system to model the RNA sequences that reveals their structural information by using efficient dynamic programming algorithms and deep learning approaches. We aim to 1) achieve linear-time for RNA secondary structure prediction based on existing minimum free energy models; 2) utilize deep neural networks...
Many database users are not familiar with formal query languages, the concept of schema, or the exact content of their database. Thus, it is challenging for these users to formulate their information needs over semi-structured and structured databases. To address this problem, researchers have proposed usable query interfaces over which...
Machine learning (ML) and deep learning (DL) models impact our daily lives with applications in natural language modeling, image analysis, healthcare, genomics, and bioinformatics. The exponential growth of biological sequence data necessitates accompanying advances in computational methods. Although deep learning is highly effective for detecting and classifying biological sequences, challenges...
Learning Analytics and other branches of Educational Research such as Computing Education Research (CER) implicitly assume that students, especially college students, have no barriers to access learning platforms or software packages. This assumption may be attributed to such pervasive beliefs such as "everyone has a device", or "everyone can access...
Humans are remarkably efficient in learning by interacting with other people and observing their behavior. Children learn by watching their parents’ actions and mimic their behavior. When they are not sure about their parents demonstration, they communicate with them, ask questions, and learn from their feedback. On the other hand,...
Building software systems that adapt to the changing environment is challenging. Developers cannot anticipate all the changes in advance, and even if they could, the effort required to handle such situations is too onerous for practical purposes. Self Adaptive Software (SAS) adapts itself as per changing environment. The area of...
Scientists and engineers have to analyze and query multiple large databases. Analysis over databases created by phasor measurement units can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data efficiently, which...
This thesis studies the problem of structured prediction (SP), where the agent needs to predict a structured output for a given structured input (e.g., Part-of-Speech tagging sequence for an input sentence). Many important applications including machine translation in natural language processing (NLP) and image interpretation in computer vision can be...
Narratives are central to communication and the human experience. For a computer system to understand a narrative, it must be able to identify the key facts or plot elements that describe what happened or how the world has changed. These element are called events;establishing a document’s events and the relationships...
The advent of deep learning models leads to a substantial improvement in a wide range of NLP tasks, achieving state-of-art performances without any hand-crafted features. However, training deep models requires a massive amount of labeled data. Labeling new data as a new task or domain emerges consumes time and efforts...
Traditional bus-based interconnects are simple and easy to implement, but the scalability is greatly limited. While router-based networks-on-chip (NoCs) offer superior scalability, they also incur significant power and area overhead due to complex router structures. In this thesis, a new class of on-chip networks, referred to as Routerless (RL) NoCs,...
Anomaly detection has been used in variety of applications in practice, including cyber-security, fraud detection and detecting faults in safety critical systems, etc. Anomaly detectors produce a ranked list of statistical anomalies, which are typically examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, most...
Learning novel concepts from relational databases is an important problem with applications in several disciplines, such as data management, natural language processing, and bioinformatics. For a learning algorithm to be effective, the input data should be clean and in some desired representation. However, real-world data is usually heterogeneous – the...
Information Foraging Theory (IFT) has successfully explained how people seek information in various domains, in turn, informing the design of several tools and information-intensive environments. However, prior research has not explored foraging in the presence of several, very similar variants of the same artifact. Such variants are commonplace in several...
To users, mobile touchscreen devices have appealing characteristics; among these characteristics is intuitiveness, which leads to mobile devices being used almost everywhere by almost everyone to accomplish almost anything. This statement, to some degree, holds for children too. Despite touchscreen devices’ intuitiveness and popularity, we don’t know much about how...
This dissertation addresses the problem of video labeling at both the frame and pixel levels using deep learning. For pixel-level video labeling, we have studied two problems: i) Spatiotemporal video segmentation and ii) Boundary detection and boundary flow estimation. For the problem of spatiotemporal video segmentation, we have developed recurrent...
There are growing interests in designing polynomial-time approximation schemes (PTAS) for optimization problems in planar graphs. Many NP-hard problems are shown to admit PTAS in planar graphs in the last decade, including Steiner tree, Steiner forest, two- edge-connected subgraphs and so on. We follow this research line and study several...
Cryptographic obfuscation is a powerful tool that makes programs “unintelligible” yet still runnable. It essentially gives programs the ability to keep secrets. The practical applications of obfuscation range from keeping secrets in banking applications to preventing software theft to providing secure messaging applications. The cryptographic applications of obfuscation are also...
Most tasks in natural language processing (NLP) involves structured information from both input (e.g., a sentence or a paragraph) and output (e.g., a tag sequence, a parse tree or a translated sentence). While neural models achieve great successes in other domains such as computer vision, applying those frameworks to NLP...
Private set intersection (PSI) allows two parties, who each hold a set of items, to compute the intersection of those sets without revealing anything about other items. Recent advances in PSI have significantly improved its performance for the case of semi-honest security, making semi-honest PSI a practical alternative to insecure...
Tree-like patterns are ubiquitous in nature. Botanical trees, river networks, and blood systems are the most well-known examples of complex hierarchical systems met in observations. Interestingly, many of such systems exhibit statistical self-similarity. There are two main types of self-similarity: Horton self-similarity and Tokunaga self-similarity. Although there is an increased...
Variability is an important and widely studied topic across domains such as version control, software product lines, and metaprogramming. This dissertation presents an investigation into the process of systematically adding variability to data structures and programs, leading to guidelines for variational data structures and implications for programs that create, manipulate,...
3D volume segmentation is a fundamental process in many scientific and medical applications. Producing accurate segmentations, in an efficient way, is challenging, in part due to low imaging data quality (e.g., noise and low image resolution), and ambiguity in the data that can only be resolved with higher-level knowledge of...
Planarity has been successfully exploited to design faster and more accurate approximation algorithms for many graph optimization problems. The celebrated theorem of Kuratowski completely characterizes planar graphs as those excluding K_5 and K_{3,3} as minors. Kuratowski's theorem allows one to generalize planar graphs to H-minor-free graphs: those that exclude a...
Our confidence in software systems depends on our confidence in the exhaustiveness of our testing. As software systems get more complex, the task of exhaustive testing becomes more complex and even infeasible in some cases. In order to build less error prone systems, we therefore need to not only focus...
In many traditional computer graphics applications, rendered scenes typically utilize 3D meshes to represent objects within an environment. As the demand to further improve the realism of graphics applications increases, such as for movies and games, it is becoming more important to represent the inner volumes of object meshes. In...
This dissertation addresses object recognition in challenging settings, where distinct object classes are visually very similar (e.g., species of birds and insects) and/or access to training examples of object classes is limited (e.g., due to the associated high costs of data annotation). In this dissertation, we present a variety of...
Soft keyboards come in different shapes, sizes, and layouts. Each layout is designed to help the users in different inputting tasks. Most of these layouts, however, focus on general text entry as opposed to computer programs. This dissertation addresses the problems with current input mechanisms on touchscreen devices. The dissertation...
The thesis focuses on activity recognition from sensor data, which has spurred a great deal of interest due to its impact on health care and security. Previous work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort...
Society faces many complex management problems, particularly in the area of shared public resources such as ecosystems. Existing decision making processes are often guided by personal experience and political ideology rather than state-of-the-art scientific understanding. This dissertation envisions a future in which multiple stakeholders are provided with computational tools for...
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not...
This dissertation addresses the problem of semantic labeling of image pixels. In the course of our work, we considered different types of semantic labels, including object classes (e.g., car, person), 3D depth values (in the range 0 to 80 meters), and affordance classes (e.g., walkable, sittable). Semantic pixel labeling is...
Collaboration is tricky, but often beneficial in the context of numerous software related activities, from learning core concepts, to the design and implementation of large software products. The growth of online classes, from small structured seminars to massive open online courses (MOOCs), and the isolation and impoverished learning experience some...
Mutation analysis is the gold standard for evaluating test-suite adequacy. It involves exhaustive seeding of all small faults in a program and evaluating the effectiveness of test suites in detecting these faults. Mutation analysis subsumes numerous structural coverage criteria, approximates fault detection capability of test suites, and the faults produced...
State-of-the-art personal robots need to perform complex manipulation tasks to be viable in complex scenarios. However, many of these robots, like the PR2, use manipulators with high degrees of freedom. High degrees of freedom are desirable from a functionality standpoint, but make the learning task more difficult by adding a...
Most tasks in natural language processing (NLP) try to map structured input (e.g., sentence or word sequence) to some form of structured output (tag sequence, parse tree, semantic graph, translated/paraphrased/compressed sentence), a problem known as “structured prediction”. While various learning algorithms such as the perceptron, maximum entropy, and expectation-maximization have...
Social interactions are a ubiquitous part of our lives, and the creation of online social communities has been a natural extension of this phenomena. Free and Open Source Software (FOSS) development efforts are prime examples of how communities can be leveraged in software development, where groups are formed around communities...
Although machine learning systems are often effective in real-world applications, there are situations in which they can be even better when provided with some degree of end user feedback. This is especially true when the machine learning system needs to customize itself to the end user's preferences, such as in...
The proliferation of mobile users and internet content has advanced a plethora of research areas. Among these areas include mobile networks, transport layer protocols, and smart cities. Research shows that global mobile data traffic will increase sevenfold reaching 49 exabytes per month by 2021, most of which will be mobile...
The main goal of automated test generation is to improve the reliability of a program by exposing faults to developers. To this end, testing should cover the largest possible portion of the program given a test budget (i.e., time and resources) as frequently as possible. Coverage of a program entity...
A bad software development process leads to wasted effort and inferior products. In order to improve a software process, it must be first understood. In this work I focus on understanding software processes.
The first process we seek to understand is Continuous Integration (CI). CI systems automate the compilation, building,...
Monte Carlo tree search (MCTS) is a class of online planning algorithms for Markov decision processes (MDPs) and related models that has found success in challenging applications. In the online planning approach, the agent makes a decision in the current state by performing a limited forward search over possible futures...
This thesis focuses on the problem of object tracking. Given a video, the general objective of tracking is to track the location over time of one or more targets in the image sequence. This is a very challenging task as algorithms need to deal with problems such as appearance variations,...
Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research...
This work is inspired by problems in natural resource management centered on the challenge of invasive species. Computing optimal management policies for maintaining ecosystem sustainable is challenging. Many ecosystem management problems can be formulated as MDP (Markov Decision Process) planning problems. In a simulator-defined MDP, the Markovian dynamics and rewards...
Machine learning models for natural language processing have traditionally relied on large numbers of discrete features, built up from atomic categories such as word forms and part-of-speech labels, which are considered completely distinct from each other. Recently however, the advent of dense feature representations coupled with deep learning techniques has...
Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four...
Software testing is a very important task during software development and it can be used to improve the quality and reliability of the software system. One potential way to reduce the cost and increase the efficiency of software testing is to generate test data automatically. Search-based approaches successfully generate unit...
Empirical studies have shown that programmers spend up to one-third of their time navigating through code during debugging. Although researchers have conducted empirical studies to understand programmers’ navigation difficulties and developed tools to address those difficulties, the resulting findings tend to be loosely connected to each other. To address this...
Maintaining the sustainability of the earth’s ecosystems has attracted much attention as these ecosystems are facing more and more pressure from human activities. Machine learning can play an important role in promoting sustainability as a large amount of data is being collected from ecosystems. There are at least three important...