Technical Reports (Electrical Engineering and Computer Science)
http://hdl.handle.net/1957/8305
2014-09-16T15:50:50ZRobust extraction and analysis towards complete 3D tensor field topology
http://hdl.handle.net/1957/48653
Robust extraction and analysis towards complete 3D tensor field topology
Palacios, Jonathan; Yeh, Harry Hsiu-jen; Wang, Wenping; Laramee, Robert S.; Vincent, Paul; Zhang, Eugene
Three-dimensional symmetric tensor fields have a wide range of applications in solid and fluid mechanics. Recent advances in the topological analysis of 3D symmetric tensor fields focus on the local behaviors of tensor fields at degenerate points, which usually form curves. In this paper, we make a number of observations about tensor field topology that are more global in nature. For instance, a degenerate curve can be a knot, and two degenerate curves may be linked. We explore the conditions under which this might occur. In addition, we introduce the notion of eigenvalue manifold which provides a more global description of tensor field topology. As part of our investigation, we include additional tensors into tensor field topology, such as the boundary between the linear and planar types of tensors, as well as traceless tensors.
Robust extraction of degenerate curves is a challenging task, despite recent progress. We convert the problem of finding degenerate curves into solving a system of algebraic equations. This approach allows us to borrow techniques from the computer-aided design
(CAD) community as well as the algebraic geometry community. The end result is a two-step pipeline that first locates mesh cells in which degenerate points can occur. Existing methods are then used to extract the degenerate curves inside these cells. This
approach provides the guarantee that no cells containing a degenerate curve will be missed due to numerical issues with existing degenerate curve extraction methods. We also find the surface-type of tensor field topology using this approach. Finally, we apply our analysis to a simulated seismic wave field propagation from an earthquake as well as a selection of fluid flow data sets. We describe the reaction and resulting insights from domain experts in the fields of fluid mechanics and seismology.
2013-01-01T00:00:00ZMutant census : an empirical examination of the competent programmer hypothesis
http://hdl.handle.net/1957/47353
Mutant census : an empirical examination of the competent programmer hypothesis
Gopinath, Rahul; Jensen, Carlos (Computer scientist); Groce, Alex; Oregon State University. School of Electrical Engineering and Computer Science
Mutation analysis is often used to compare the effectiveness of different test suites or testing techniques. One of the main assumptions underlying this technique is the Competent Programmer Hypothesis, which proposes that programs are very close to a correct version, or that the difference between current and correct code for each fault is very small. Testers have generally assumed, on the basis of the Competent Programmer Hypothesis, that mutation analysis with single token changes produces mutations that are similar to real faults. While there exists some evidence that supports this assumption, these studies are based on analysis of a limited and potentially non-representative set of programs and are hence not conclusive. In this paper, we investigate the Competent Programmer Hypothesis by analyzing changes (and bug-fixes in particular) in a very large set of randomly selected projects using four different programming languages. Our analysis suggests that a typical fault involves about three to four tokens, and is seldom equivalent to any traditional mutation operator. We also find the most frequently occurring syntactical patterns, and identify the factors that affect the real bug-fix change distribution. Our analysis suggests that different languages have different distributions, which in turn suggests that operators optimal in one language may not be optimal for others. Moreover, our results suggest that mutation analysis stands in need of better empirical support of the connection between mutant detection and detection of actual program faults in a larger body of real programs.
2014-01-01T00:00:00ZHow Do Centralized and Distributed Version Control Systems Impact Software Changes?
http://hdl.handle.net/1957/44927
How Do Centralized and Distributed Version Control Systems Impact Software Changes?
Brindescu, Caius; Codoban, Mihai; Shmarkatiuk, Sergii; Dig, Danny
Distributed Version Control Systems (DVCS) have seen an increase in popularity relative to traditional Centralized Version Control Systems (CVCS). Yet we know little on whether developers are benefitting from the extra power of DVCS. Without such knowledge, researchers, developers, tool builders, and team managers are in the danger of making wrong assumptions.
In this paper we present the first in-depth, large scale empirical study that looks at the influence of DVCS on the practice of splitting, grouping, and committing changes. We recruited 820 participants for a survey that sheds light into the practice of using DVCS. We also analyzed 409M lines of code changed by 358300 commits, made by 5890 developers, in 132 repositories containing a total of 73M LOC. Using this data, we uncovered some interesting facts. For example, (i) commits made in distributed repositories were 32% smaller than the centralized ones, (ii) developers split commits more often in DVCS, and (iii) DVCS commits are more likely to have references to issue tracking labels.
2014-01-07T00:00:00ZCost Effective Conceptual Design for Semantic Annotation (Extended Version)
http://hdl.handle.net/1957/41824
Cost Effective Conceptual Design for Semantic Annotation (Extended Version)
Termehchy, Arash; Vakilian, Ali; Yodsawalai, Chodpathumwan; Winslett, Marianne
It is well established that annotating occurrences
of entities in a collection of unstructured or semi-structured
text documents with their concepts (i.e. entity sets), called
semantic annotation, improves the effectiveness of answers to
usersâ€™ queries. However, an enterprise has to spend large amount
of financial, computational, and human resources to develop and
deploy an annotation program to semantically annotate a concept
in a large collection. Moreover, since the structure and content
of the documents may evolve over time, the annotation programs
should be often rewritten and repaired. These efforts are even
more costly for concepts that are defined in specific domains, such
as medicine and the law, as they require extensive collaboration
between developers and domain experts. Since the available
resources in an enterprise are limited, it has to select only a
subset of relevant concepts for annotation. We call this subset a
conceptual design for the annotated collection. To the best of our
knowledge, finding a conceptual design for semantic annotation
are generally left to intuition. In this paper, we introduce and
formally define the problem of cost effective conceptual design,
where given a set of relevant concepts and a fixed budget, one
likes to find a conceptual design that improves the effectiveness of
answers to usersâ€™ queries the most. We prove that the problem
is generally NP-hard in the number of relevant concepts and
propose two efficient approximation algorithms: Approximate
Popularity Maximization (APM for short) and Annotation-Benefit
Maximization (AAM for short). We prove that APM has a
constant approximation ratio and AAM is a fully polynomial
time approximation scheme algorithm. We validate our analysis
using extensive experiments over Wikipedia articles and queries
form a real-world search engine query log. Our results indicate
that AAM generally returns optimal or near-optimal conceptual
designs that are more effective than the solutions provided
by APM in most cases. Since the precise values of the input
parameters for APM and AAM may not be available at the
design time, we analyze the sensitivity of AAM to the estimation
errors in their input parameters. Our results show that using
input parameters computed over small samples of the collection,
AAM generally return the same answers as the cases where they
have full information about the values of their input parameters.
2013-08-16T00:00:00Z