Faculty Research Publications (Mathematics)
http://hdl.handle.net/1957/13818
Wed, 03 Feb 2016 19:16:52 GMT2016-02-03T19:16:52ZARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition
http://hdl.handle.net/1957/57783
ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition
Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; et al.
Motivation:
Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging.
Results:
There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity.
To the best of our knowledge, one or more authors of this paper were federal employees when contributing to this work. This is the publisher’s final pdf. The article was published by the Public Library of Science and is in the public domain. The published article can be found at: http://www.plosone.org/.; Supporting information available online at: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140644#sec021; An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware
Fri, 23 Oct 2015 00:00:00 GMThttp://hdl.handle.net/1957/577832015-10-23T00:00:00ZPressure Forcing and Dispersion Analysis for Discontinuous Galerkin Approximations to Oceanic Fluid Flows
http://hdl.handle.net/1957/57749
Pressure Forcing and Dispersion Analysis for Discontinuous Galerkin Approximations to Oceanic Fluid Flows
Higdon, Robert L.
This paper is part of an effort to examine the application of discontinuous Galerkin (DG) methods to the numerical modeling of the general circulation of the ocean. One step performed here is to develop an integral weak formulation of the lateral pressure forcing that is suitable for usage with a DG method and with a generalized vertical coordinate that includes level, terrain-fitted, isopycnic, and hybrid coordinates as examples. This formulation is then tested, in special cases, with analyses of dispersion relations and numerical stability and with some computational experiments. These results suggest that the advantages of DG methods may significantly outweigh their disadvantages, in the settings tested here. This paper also outlines some other issues that need to be addressed in future work.
Sun, 15 Sep 2013 00:00:00 GMThttp://hdl.handle.net/1957/577492013-09-15T00:00:00ZMultiple time scales and pressure forcing in discontinuous Galerkin approximations to layered ocean models
http://hdl.handle.net/1957/57748
Multiple time scales and pressure forcing in discontinuous Galerkin approximations to layered ocean models
Higdon, Robert L.
This paper addresses some issues involving the application of discontinuous Galerkin (DG) methods to ocean circulation models having a generalized vertical coordinate. These issues include the following. (1) Determine the pressure forcing at cell edges, where the dependent variables can be discontinuous. In principle, this could be accomplished by solving a Riemann problem for the full system, but some ideas related to barotropic–baroclinic time splitting can be used to reduce the Riemann problem to a much simpler system of lower dimension. Such splittings were originally developed in order to address the multiple time scales that are present in the system. (2) Adapt the general idea of barotropic–baroclinic splitting to a DG implementation. A significant step is enforcing consistency between the numerical solution of the layer equations and the numerical solution of the vertically-integrated barotropic equations. The method used here has the effect of introducing a type of time filtering into the forcing for the layer equations, which are solved with a long time step. (3) Test these ideas in a model problem involving geostrophic adjustment in a multi-layer fluid. In certain situations, the DG formulation can give significantly better results than those obtained with a standard finite difference formulation.
This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier and can be found at: http://www.journals.elsevier.com/journal-of-computational-physics/
Sat, 15 Aug 2015 00:00:00 GMThttp://hdl.handle.net/1957/577482015-08-15T00:00:00ZSymmetry breaking and uniqueness for the incompressible Navier-Stokes equations
http://hdl.handle.net/1957/57110
Symmetry breaking and uniqueness for the incompressible Navier-Stokes equations
Dascaliuc, Radu; Michalowski, Nicholas; Thomann, Enrique; Waymire, Edward C.
The present article establishes connections between the structure of the deterministic Navier-Stokes equations and the structure of (similarity) equations that govern self-similar solutions as expected values of certain naturally associated stochastic cascades. A principle result is that explosion criteria for the stochastic cascades involved in the probabilistic representations of solutions to the respective equations coincide. While the uniqueness problem itself remains unresolved, these connections provide interesting problems and possible methods for investigating symmetry breaking and the uniqueness problem for Navier-Stokes equations. In particular, new branching Markov chains, including a dilogarithmic branching random walk on the multiplicative group (0, ∞), naturally arise as a result of this investigation.
Copyright 2015 American Institute of Physics. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the American Institute of Physics.; The following article appeared in Chaos: An Interdisciplinary Journal of Nonlinear Science and may be found at http://scitation.aip.org/content/aip/journal/chaos
Wed, 01 Jul 2015 00:00:00 GMThttp://hdl.handle.net/1957/571102015-07-01T00:00:00Z