Theses and Dissertations (Statistics)
http://hdl.handle.net/1957/18468
2015-11-25T20:43:32ZVariable selection in semi-parametric models
http://hdl.handle.net/1957/40642
Variable selection in semi-parametric models
Jiang, Shuping
We consider two semiparametric regression models for data analysis, the stochastic additive model (SAM) for nonlinear time series data and the additive coefficient model (ACM) for randomly sampled data with nonparametric structure. We employ the SCAD-penalized polynomial spline estimation method for estimation and simultaneous variable selection in both models. It approximates the nonparametric functions by polynomial splines, and minimizes the sum of squared errors subject to an additive penalty on norms of spline functions. A coordinate-wise algorithm is developed for finding the solution for the penalized polynomial spline problem. For SAM, we establish that, under geometrically α-mixing, the resulting estimator enjoys the optimal rate of convergence for estimating the nonparametric functions. It also selects the correct model with probability approaching to one as the sample size increases. For ACM, we investigate the asymptotic properties of the global solution of the non-convex objective function. We establish explicitly that the oracle estimator is the global solution with probability approaching to one. Therefore, the global solution enjoys both model estimation and selection consistency. In the literature, the asymptotic properties of local solutions rather than global solutions are well established for non-convex penalty functions. Our theoretical results broaden the traditional understandings of the penalized polynomial spline method. For both models, extensive Monte Carlo studies have been conducted and show the proposed procedure works effectively even with moderate sample size. We also illustrate the use of the proposed methods by analyzing the US unemployment time series under SAM, and the Tucson housing price data under ACM.
Graduation date: 2014
2013-06-05T00:00:00ZFisher and logistic discriminant function estimation in the presence of collinearity
http://hdl.handle.net/1957/37471
Fisher and logistic discriminant function estimation in the presence of collinearity
O'Donnell, Robert P. (Robert Paul)
The relative merits of the Fisher linear discriminant function
(Efron, 1975) and logistic regression procedure (Press and Wilson,
1978; McLachlan and Byth, 1979), applied to the two group
discrimination problem under conditions of multivariate normality and
common covariance, have been debated. In related research, DiPillo
(1976, 1977, 1979) has argued that a biased Fisher linear
discriminant function is preferable when one or more collinearities
exist among the classifying variables.
This paper proposes a generalized ridge logistic regression
(GRL) estimator as a logistic analog to DiPillo's biased alternative
estimator. Ridge and Principal Component logistic estimators
proposed by Schaefer et al. (1984) for conventional logistic
regression are shown to be special cases of this generalized ridge
logistic estimator.
Two Fisher estimators (Linear Discriminant Function (LDF) and
Biased Linear Discriminant Function (BLDF)) and three logistic
estimators (Linear Logistic Regression (LLR), Ridge Logistic
Regression (RLR) and Principal Component Logistic Regression (PCLR))
are compared in a Monte Carlo simulation under varying conditions of
distance between populations, training set s1ze and degree of
collinearity. A new approach to the selection of the ridge parameter
in the BLDF method is proposed and evaluated.
The results of the simulation indicate that two of the biased
estimators (BLDF, RLR) produce smaller MSE values and are more stable
estimators (smaller standard deviations) than their unbiased
counterparts. But the improved performance for MSE does not
translate into equivalent improvement in error rates. The expected
actual error rates are only marginally smaller for the biased
estimators. The results suggest that small training set size, rather
than strong collinearity, may produce the greatest classification
advantage for the biased estimators.
The unbiased estimators (LDF, LLR) produce smaller average apparent
error rates. The relative advantage of the Fisher
estimators over the logistic estimators is maintained. But, given
that the comparison is made under conditions most favorable to the
Fisher estimators, the absolute advantage of the Fisher estimators is
small. The new ridge parameter selection method for the BLDF
estimator performs as well as, but no better than, the method used by
DiPillo.
The PCLR estimator shows performance comparable to the other
estimators when there is a high level of collinearity. However, the
estimator gives up a significant degree of performance in conditions
where collinearity is not a problem.
Graduation date: 1991
1990-09-27T00:00:00ZFactors affecting bird counts and their influence on density estimates
http://hdl.handle.net/1957/37264
Factors affecting bird counts and their influence on density estimates
McCracken, Marti L.
Variable area surveys are used in large geographic regions to estimate the
density of birds distributed over a region. If some birds go undetected, a
measure of the effective area surveyed, the amount of area occupied by the birds
detected, is needed. The effective area surveyed is determined by observational,
biological, and environmental factors relating to detectability. It has been
suggested that density estimates are inaccurate, and that it is risky to compare
bird populations intraspecifically over time and space, since factors influencing
bird counts will vary.
There have been several controversial studies where variable area survey
density estimates were evaluated using density estimates calculated from spot
mapping as the standard for comparison. Spot mapping itself is an unproven
estimator that the previously mentioned factors also influence. Without a known
population density, determining how the different density estimators perform is
difficult to access. Variable area surveys of inanimate objects whose densities
were known have been conducted under controlled circumstances with results
generally supporting the variable area survey method, but time and inability to
control for all factors limit the application of this type of study. A simulation
program that distributes over a region vegetation and a known density of birds,
and then simulates the process of gathering bird detection data is one tool
accessible to evaluate variable area density estimates. Within such a simulation
study various observational, biological, and environment factors could be
introduced.
This thesis introduces such a simulation program, VABS, that was written with the objectives of identifying factors that influence bird counts and determining the limitations of the variable area survey. Within this thesis are discussions concerning the several factors that have been identified as influencing bird counts and the effects that these factors had on the Fourier series, exponential power series, and Cum-D density estimates when these factors were simulated in VABS. Critical assumptions of the variable area survey are identified, and the ability of the variable area survey to estimate density for different detectability curve is examined. Also included are discussions on the topics of pooling data gathered under different detectabilities and monitoring population trends.
Graduation date: 1994
1993-07-22T00:00:00ZUses of Bayesian posterior modes in solving complex estimation problems in statistics
http://hdl.handle.net/1957/36872
Uses of Bayesian posterior modes in solving complex estimation problems in statistics
Lin, Lie-fen
In Bayesian analysis, means are commonly used to
summarize Bayesian posterior distributions. Problems with
a large number of parameters often require numerical
integrations over many dimensions to obtain means. In this
dissertation, posterior modes with respect to appropriate
measures are used to summarize Bayesian posterior
distributions, using the Newton-Raphson method to locate
modes. Further inference of modes relies on the normal
approximation, using asymptotic multivariate normal
distributions to approximate posterior distributions. These
techniques are applied to two statistical estimation
problems.
First, Bayesian sequential dose selection procedures
are developed for Bioassay problems using Ramsey's prior
[28]. Two adaptive designs for Bayesian sequential dose
selection and estimation of the potency curve are given.
The relative efficiency is used to compare the adaptive
methods with other non-Bayesian methods (Spearman-Karber,
up-and-down, and Robbins-Monro) for estimating the ED50 .
Second, posterior distributions of the order of an
autoregressive (AR) model are determined following Robb's
method (1980). Wolfer's sunspot data is used as an example
to compare the estimating results with FPE, AIC, BIC, and
CIC methods. Both Robb's method and the normal
approximation for estimation of the order have full
posterior results.
Graduation date: 1992
1992-03-17T00:00:00Z