Respondent-driven sampling (RDS) is used throughout the world to estimate prevalence and population size for hidden populations. Although RDS is an effective method for enrolling people from key populations in studies, it relies on a partially unknown sampling mechanism, and thus each individual’s inclusion probability is unknown. Current estimators for...
Advective skew dispersion is a natural Markov process defined ned
by a di ffusion with drift across an interface of jump discontinuity in
a piecewise constant diff usion coeffcient. In the absence of drift this
process may be represented as a function of -skew Brownian motion
for a uniquely determined...
We present a generalized estimating equations approach for estimating the concordance correlation coefficient and the
kappa coefficient from sample survey data. The estimates and their accompanying standard error need to correctly account
for the sampling design. Weighted measures of the concordance correlation coefficient and the kappa coefficient, along with
the...
Analysis of stochastic models of networks is quite important in light of the huge influx of network data in social, information and bio sciences, but a proper statistical analysis of features of different stochastic models of networks is still underway.We propose bootstrap subsampling methods for finding empirical distribution of count...
We propose generalized additive partial linear models for complex data
which allow one to capture nonlinear patterns of some covariates, in the presence
of linear components. The proposed method improves estimation efficiency
and increases statistical power for correlated data through incorporating
the correlation information. A unique feature of the proposed...
The purpose of this note is to provide a coupling of weak limits in distribution of sequence of (normalized) multiplicative cascade measures under strong disorder in terms of the extremes of an associated branching random walk, assuming i.i.d positive, non-lattice bond weights and a second moment condition. The solution is...
This article concerns a systemic manifestation of small scale interfacial heterogeneities in large scale quantities of interest to a variety of diverse applications spanning the earth, biological and ecological sciences. Beginning with formulations in terms of partial differential equations governing the conservative, advective-dispersive transport of mass concentrations in divergence form,...
The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. It is important to identify significant covariates associated with response variables, especially for high-dimensional settings where the number of covariates can be larger than the sample size. We consider model selection in the high-dimensional setting...
The detection and determination of clusters has been of special interest among researchers from different fields for a long time. In particular, assessing whether the clusters are significant is a question that has been asked by a number of experimenters. In Fuentes and Casella (2009), the authors put forth a...
GIScience (geographic information science) is a scholarly discipline that addresses
fundamental issues surrounding the use of a variety of digital technologies to handle
geographic information; namely, information about places, activities, and phenomena on
and near the surface of the Earth that are stored in maps or images. GIScience includes
the...
Environmental monitoring poses two challenges to statistical analysis: complex data and complex survey designs. Monitoring for system health involves measuring physical, chemical, and biological properties that have complex relations. Exploring these relations is an integral part of understanding how systems are changing under stress. How does one explore high dimensional...
In Bayesian analysis, means are commonly used to
summarize Bayesian posterior distributions. Problems with
a large number of parameters often require numerical
integrations over many dimensions to obtain means. In this
dissertation, posterior modes with respect to appropriate
measures are used to summarize Bayesian posterior
distributions, using the Newton-Raphson method...
Geostatistical linear interpolation procedures such as kriging require knowledge of the
covariance structure of the spatial process under investigation. In practice, the covariance of the
process is unknown, and must be estimated from the available data. As the quality of the
resulting predictions, and associated mean square prediction errors, depends...
Mixed linear models are a time honored method of analyzing correlated data. However, there is still no method of calculating exact confidence intervals or p-values for an arbitrary parameter in any mixed linear model. Instead, researchers must use either specialized approximate and exact tests that have been developed for particular...
We consider two semiparametric regression models for data analysis, the stochastic additive model (SAM) for nonlinear time series data and the additive coefficient model (ACM) for randomly sampled data with nonparametric structure. We employ the SCAD-penalized polynomial spline estimation method for estimation and simultaneous variable selection in both models. It...
In regression analysis, random errors in an explanatory variable cause the
usual estimates of its regression coefficient to be biased. Although this problem has
been studied for many years, routine methods have not emerged. This thesis
investigates some aspects of this problem in the setting of analysis of epidemiological
data....
Most statistical surveys and data collection studies encounter missing data. A common solution to this problem is to discard observations with missing data while reporting the percentage of missing observations in different output tables. Imputation is a tool used to fill in the missing values. This dissertation introduces the missing...
Graphical models use Markov properties to establish associations among dependent variables. To estimate spatial correlation and other parameters in graphical models, the conditional independences and joint probability distribution of the graph need to be specified. We can rely on Gaussian multivariate models to derive the joint distribution when all the...
This dissertation concerns two topics in the analysis of finite population surveys:
setting sample size and hypothesis testing. The first concerns the a priori determination
of the sample size needed to obtain species members. The second concerns
testing distributional hypotheses when two equal-size populations are sampled.
Setting sample size to...
The semi-parametric approach to the analysis of proportional hazards survival data
is relatively new, having been initiated in 1972 by Sir David Cox, who restricted its use
to hypothesis tests and confidence intervals for fixed effects in a regression setting.
Practitioners have begun to diversify applications of this model, constructing...
Statisticians often focus on sampling or experimental design and data analysis while paying less attention to how the response is measured. However, the ideas of statistics may be applied to measurement problems with fruitful results. By examining the errors of measured responses, we may gain insight into the limitations of...
Semiparametric maximum likelihood analysis allows inference in errors-invariables models with small loss of efficiency relative to full likelihood analysis but with significantly weakened assumptions. In addition, since no distributional assumptions are made for the nuisance parameters, the analysis more nearly parallels that for usual regression. These highly desirable features and...
The relative merits of the Fisher linear discriminant function
(Efron, 1975) and logistic regression procedure (Press and Wilson,
1978; McLachlan and Byth, 1979), applied to the two group
discrimination problem under conditions of multivariate normality and
common covariance, have been debated. In related research, DiPillo
(1976, 1977, 1979) has argued...
In this thesis we develop a theoretical framework for the identification of situations where the equal frequency (EF) or equal variance (EV) subclassification may produce lower bias and/or variance of the estimator. We conduct simulation studies to examine the EF and EV approaches under different types of model misspecification. We...
This dissertation considers two approaches for testing hypotheses in
unbalanced mixed linear models. The first approach is to construct a design with
some type of structure or "partial" balance, so that some of the optimal properties of
a completely balanced design hold. It is shown that for a particular type...
Two topics concerning statistical sampling of initiative petitions are considered in this dissertation. The first concerns statistical estimation of the number of distinct valid signatures in a petition, and the second evaluates the statistical decision rule used by Oregon for determining certification of state initiative and referendum petitions.
In several...
The problem addressed is absence of a class of objects in a finite set of objects, which
is investigated by considering absence of a species and absence in relation to a threshold.
Regarding absence of a species, we demonstrate that the assessed probability of absence
of the class of objects...
This thesis proposes an approximate maximum likelihood estimator and
likelihood ratio test for parameters in a generalized linear model when two or
more random effects are present. Substantial progress in parameter estimation
for such models has been made with methods involving generalized least squares
based on the approximate marginal mean...
The Thurstone-Mosteller and Bradley-Terry Models are commonly used to rank items from paired comparisons experiments in which one item in each pair "wins," and to assess the importance of time-independent explanatory variables on such rankings. The first part of this thesis clarifies the use of probit and logistic regression models...
Many ecological populations can be interpreted as response surfaces; the spatial
patterns of the population vary in response to changes in the spatial patterns of
environmental explanatory variables. Collection of a probability sample from the
population provides unbiased estimates of the population parameters using design
based estimation. When information is...
When data are not missing at random, approaches to reduce nonresponse bias include subsampling nonresponding units and modeling. The objective of this thesis is to develop unbiased and precise model-assisted estimators of the population total that are applicable to data from a complex survey design with nonignorable nonresponse. When information...
This thesis consists of three papers which investigate marginal models, nonparametric approaches, generalized mixed effects models and variance components estimation in longitudinal data analysis. In the first paper, a new marginal approach is introduced for high-dimensional cell-cycle microarray data with no replicates. There are two kinds of correlation for cell-cycle...
This thesis advocates the use of maximum likelihood analysis for generalized
regression models with measurement error in a single explanatory variable. This will be
done first by presenting a computational algorithm and the numerical details for carrying
out this algorithm on a wide variety of models. The computational methods will...
Methodologies for data reduction, modeling, and classification of grouped
response curves are explored. In particular, the thesis focuses on the analysis of
a collection of highly correlated, highly dimensional response-curve data of
spectral reflectance curves of wood surface features.
In the analysis, questions about the application of cross-validation
estimation of...
Testing main effects and interaction effects in factorial designs are basic content in statistics textbooks and widely used in various fields. In balanced designs there is general agreement on the appropriate main effect and interaction sums of squares and these are typically displayed in an analysis of variance (ANOVA). A...
Mixed models have been widely used to model data from experiments which have fixed and random
factors. Often there is interest in the estimation of fixed effects and variance components. The likelihood
procedure is a general technique that has been applied to such problems. This procedure can be
computationally difficult,...
Variable area surveys are used in large geographic regions to estimate the density of birds distributed over a region. If some birds go undetected, a measure of the effective area surveyed, the amount of area occupied by the birds detected, is needed. The effective area surveyed is determined by observational,...
A genecological approach was used to explore genetic variation in adaptive traits in Pseudoroegneria spicata, a key restoration grass, in the intermountain western United States. Common garden experiments were established at three contrasting sites with seedlings from two maternal parents from each of 114 populations along with five commercial releases...
The rapid worldwide emergence of the amphibian pathogen Batrachochytrium dendrobatidis (Bd) is having a profound negative impact on biodiversity. However, global research efforts are fragmented and an overarching synthesis of global infection data is lacking. Here, we provide results from a community tool for the compilation of worldwide Bd presence...
Differential investment in offspring has been reported for many mammals, often in the context of the Trivers–
Willard model of male-biased investment, but evidence of differential investment in pronghorns (Antilocapra
americana) is largely lacking. We assessed the causes and consequences of different birth masses of littermate
fawns in a pronghorn...
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology...